matthew cechini raytheon - eed id: in31c-07. echo metadata overview introduction problem space ...

14
ISO 19115 Experiences in NASA’s Earth Observing System (EOS) ClearingHOuse (ECHO) Matthew Cechini Raytheon - EED ID: IN31C-07

Upload: maximillian-park

Post on 17-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

ISO 19115 Experiences in NASA’s Earth Observing System (EOS) ClearingHOuse (ECHO)

Matthew CechiniRaytheon - EED ID: IN31C-07

Agenda ECHO Metadata Overview

Introduction Problem Space Solutions

• ISO 19115 Lessons Learned– Perceived Issues– Gotchas– Kudos

• Conclusion

Introduction

Earth Observing System (EOS) ClearingHOuse (ECHO) An integral component of metadata management within

NASA’s Earth Observing System Data and Information System (EOSDIS) acting as the core metadata repository and providing a centralized mechanism for metadata and data discovery and retrieval.

How metadata is used by ECHO Discovery Presentation/Documentation Interoperability Validation

Metadata Format Landscape Existing catalog utilizes ECHO format (based upon ECS

data model). Future science missions projected to provide ISO 19115

metadata.

Problem Space

Data discovery and retrieval tenets:1. There exists a set of users who will require the entire

metadata record for advanced analysis.2. There exists a set of ‘core’ metadata fields

recommended for data discovery.3. There exists a set of users who will require a ‘core’ set

of metadata fields for discovery only. 4. There will never be a cessation of new formats or a

total retirement of all old formats.5. Users should be presented metadata in a consistent

format of their choosing.

Solutions

ECHO’s metadata processing solution:1. Identify a cross-format set of ‘core’ metadata fields for

discovery.2. Implement format-specific indexers to extract the ‘core’

metadata fields into an optimized query capability.3. Archive the original metadata in its entirety for

presentation to users requiring the full record.4. Provide on-demand translation of ‘core’ metadata to

any supported result format or standard. ECHO’s usage of ISO 19115/19139

1. Archive original metadata for documentation and advanced usage.

2. Extract ‘core’ metadata fields for data discovery.3. Provide format translations from ISO to/from supported

formats.

Agenda ECHO Metadata Overview

Introduction Problem Space Solutions

• ISO 19115 Lessons Learned– Perceived Issues– Gotchas– Kudos

• Conclusion

Online Resources MimeType

The existing standard could be included, similar to how GML is incorporated, though maintained separately.

MimeType values facilitate automated access where different file types resuls in different workflows (e.g. displaying native jpg images or extracting from hdf). File extensions are not always indicative.

Type Code List values promote interoperability, but potentially

reduce the ability for intra-community customization. A type attribute allows for more detailed identification for

automated access (e.g. specific service protocols http://xml.opendap.org/ns/DAP/3.3# )

ISO 19115 - Perceived Issue

Services Resources Data Discovery

How are links to discovery services made available (e.g. data casting feeds or search endpoints)?

Endpoints may support multiple response formats, how would that be included?

Data Processing Support for data processing links appears to be not

supported. Both series and dataset level metadata may have

URLs to services that expose subsetting, projection, and other services.

Some service-specific information may be required and will need to be included in the metadata.

ISO 19115 - Perceived Issue

Hierarchical Keyword Structure

Representation Non-Standard Delimiters▪ A self-defining hierarchy could be introduced within the

keyword structure allowing for customized keyword lists.

Automated Usage Optional Fields▪ A flat representation of keyword structures that have optional

levels may cause issues for automated keyword parsing.▪ Translation into a metadata format where hierarchy is

expected may not be possible.

<gmd:keyword> <gco:CharacterString>Earth Science &gt; Oceans &gt; Ocean Temperature &gt; Sub-skin Sea Surface Temperature </gco:CharacterString></gmd:keyword>

<gmd:keyword> <gco:CharacterString>Earth Science | Oceans | Ocean Temperature | Sub-skin Sea Surface Temperature </gco:CharacterString></gmd:keyword>

ISO 19115 - Perceived Issue

Spatial Representations Coordinate Systems

Cartesian vs. Geodetic▪ EX_GeographicBoundingBox does not specify a coordinate

system. Two-D Coordinate Systems▪ Unable to find where coordinate reference systems like WRS-

2 and MODIS H/V tiling are a) defined and b) utilized. Orbit Metadata

Series Level▪ Unable to find where series level orbit metadata is

represented (e.g. swath width, period, inclination angle, etc…).

▪ This information may be required for data discovery. Dataset Level▪ Similar concern regarding placement of orbit metadata,

again used for discovery (e.g. orbit number, crossing longitude, etc…)

ISO 19115 - Perceived Issue

Gotchas Terminology

Natural difficulties reconciliing terminology between communities.▪ Dataset & Granules vs. Series & Dataset▪ Archive Center vs Custodian

Codelists are a double edged sword providing consistency but removing specificity and community vernacular.

Citation Overload Contact information can be represented in numerous

locations. Potentially stale contact information may be difficult to

track down Combined Series & Dataset Metadata

Good Idea… Combining series and dataset metadata during presentation.

Bad Idea… Combining series and dataset metadata during archival.

Kudos Citations

Thorough support for providing citations within the metadata.

Metadata Lineage ISO lineage provides an excellent means to capture

repeatable processing history information. Distribution Information

Thorough support for online and offline access options including support for ordering.

Conclusion ISO 19115 is on it’s way to becoming a viable

metadata standard for metadata as a means of documentation.

ISO 19115 is a bit verbose for the pragmatic requirements of data discovery (specifically dataset level).

ISO 19115 lacks support for the growing presence of data processing services.

All metadata standards are expected to have issues and will improve over time.

http://xkcd.com/927/

[email protected] South: IN41B-1406 - Dec. 8, 8:00am-

12:20pm