can iso 19157 support current nasa data quality metadata?

18
ISO 19157 – A Unified Metadata Model for Data Quality Ted Habermann Director of Earth Science The HDF Group thabermann@hdfgroup. org 1 The Global Change Master Directory (GCMD) and the EOS Clearinghouse (ECHO) both include data quality information? Can this information be mapped to ISO 19157? GCMD ECHO ISO ? Presented May 21, 2014 at the Earth Science Information Partnership Information Quality Cluster

Upload: ted-habermann

Post on 16-Jan-2015

346 views

Category:

Science


3 download

DESCRIPTION

ISO 19157 provides a powerful framework for describing quality of Earth science datasets. As NASA migrates towards using that standard, it is important to understand whether and how existing data quality content fits into the ISO 19157 model. This talk demonstrates that fit and concludes that ISO 19157 can include all existing content and also includes new capabilities that can be very useful for all kinds of NASA data users.

TRANSCRIPT

Page 1: Can ISO 19157 support current NASA data quality metadata?

1

ISO 19157 – A Unified Metadata Model for Data Quality

Ted HabermannDirector of Earth Science The HDF [email protected]

The Global Change Master Directory (GCMD) and the EOS Clearinghouse (ECHO) both include data quality information?

Can this information be mapped to ISO 19157?

GCMD ECHO

ISO

?Presented May 21, 2014 at the Earth Science Information Partnership Information Quality Cluster

Page 2: Can ISO 19157 support current NASA data quality metadata?

Metadata in Multiple Dialects

DocumentationRepository

ISO 19115-1, 19115-2, 19119

19157 and extensions

THREDDS

HDF, netCDF(NcML)

FGDC,Data.Gov

SensorML

WCS, WMS, WFS, SOS

Open Provenance

Model, PROV

DIF, ECS, ECHO

KML

Page 3: Can ISO 19157 support current NASA data quality metadata?

3

GCMD Data Quality

The <Quality> field allows the author to provide information about the quality of the data or any quality assurance procedures followed in producing the data described in the metadata.

Page 4: Can ISO 19157 support current NASA data quality metadata?

4

GCMD Quality Examples

<Quality> Due to the lack of high resolution data available over the region for 1993-94, it has been hard to validate the product. However the maps of burnt areas correspond well with active fire maps for the region. Where large [>3km] scars are found, the detection is more reliable. In areas of small scars more problems are involved. It is hoped that the 1994-95 data set will cover the whole of the study area and be calibrated by high resolution data. </Quality>

<Quality> QA performed by CDIAC One of the roles of the Carbon Dioxide Information Analysis Center (CDIAC) is quality assurance (QA) of data. The QA process is an important component of the value-added concept of assuring accurate, usable information for researchers, because data received by CDIAC are rarely in condition for immediate distribution, regardless of source. </Quality> <Quality>

The fire training-set may also have been biased against savanna and savanna woodland fires since their detection is more difficult than in humid, forst environments with cool background temperatures [Malingreau, 1990]. There may, therefore, be an under-sampling of fires in these warmer background environments.</Quality> <Quality>

Note that Data File 12, Report #2, TASK 2 (Auclair et al., 1994a) is a Quality Assurance and Quality Control chapter for the areas of Canada, Alaska, United States (48 states), with range estimates of validation and error, a listing of discussions with experts in the field and a review of the draft of data files.</Quality>

Page 5: Can ISO 19157 support current NASA data quality metadata?

ISO Data Quality Results

DQ_ConformanceResult

+ specification : CI_Citation+ explanation : CharacterString+ pass : Boolean

DQ_CoverageResultDQ_DescriptiveResult

+ statement : CharacterString

<<Abstract>>DQ_Result

+ dateTime [0..1] : DateTime+ resultScope [0..1] : DQ_Scope

DQ_QuantitativeResult

+ valueType [0..1] : RecordType + valueUnit : UnitOfMeasure + value [1..*] : Record

ISO 19157 includes four kinds of quality results.

It includes a new kind of report (DQ_DescriptiveResult) that includes a simple text description of the result of the quality test.

Page 6: Can ISO 19157 support current NASA data quality metadata?

ISO Data Quality Results

DQ_ConformanceResult

+ specification : CI_Citation+ explanation : CharacterString+ pass : Boolean

DQ_CoverageResultDQ_DescriptiveResult

+ statement : CharacterString

<<Abstract>>DQ_Result

+ dateTime [0..1] : DateTime+ resultScope [0..1] : DQ_Scope

DQ_QuantitativeResult

+ valueType [0..1] : RecordType + valueUnit : UnitOfMeasure + value [1..*] : Record

<Quality> QA performed by CDIAC One of the roles of the Carbon Dioxide Information Analysis Center (CDIAC) is quality assurance (QA) of data. The QA process is an important component of the value-added concept of assuring accurate, usable information for researchers, because data received by CDIAC are rarely in condition for immediate distribution, regardless of source. </Quality>

The DQ_DescriptiveResult matches the GCMD quality element exactly.

Page 7: Can ISO 19157 support current NASA data quality metadata?

7

GCMD Quality Examples

<Quality> Due to the lack of high resolution data available over the region for 1993-94, it has been hard to validate the product. However the maps of burnt areas cor- respond well with active fire maps for the region. Where large [>3km] scars are found, the detection is more reliable. In areas of small scars more problems are involved. It is hoped that the 1994-95 data set will cover the whole of the study area and be calibrated by high resolution data. </Quality>

<Quality> QA performed by CDIAC One of the roles of the Carbon Dioxide Information Analysis Center (CDIAC) is quality assurance (QA) of data. The QA process is an important component of the value-added concept of assuring accurate, usable information for researchers, because data received by CDIAC are rarely in condition for immediate distribution, regardless of source. </Quality> <Quality>

The fire training-set may also have been biased against savanna and savanna woodland fires since their detection is more difficult than in humid, forst environments with cool background temperatures [Malingreau, 1990]. There may, therefore, be an under-sampling of fires in these warmer background environments.</Quality> <Quality>

Note that Data File 12, Report #2, TASK 2 (Auclair et al., 1994a) is a Quality Assurance and Quality Control chapter for the areas of Canada, Alaska, United States (48 states), with range estimates of validation and error, a listing of discussions with experts in the field and a review of the draft of data files.</Quality>

Page 8: Can ISO 19157 support current NASA data quality metadata?

Standalone Quality Reports

DQ_Element

+ standaloneQualityReportDetails [0..1] : CharacterString

DQ_DataQuality

+ scope : DQ_Scope

+ report 1..*

DQ_StandaloneQualityReportInformation

+ reportReference: CI_Citation+ abstract : CharacterString

+ standaloneQualityReport 0..1

ISO 19157 acknowledges that important quality information can exist in standalone reports that may not fit easily into the ISO conceptual model. These standalone reports are cited with abstracts that summarize the results for the user without neccessitating access to the cited report.

Page 9: Can ISO 19157 support current NASA data quality metadata?

ECHO Quality Information

ECHO includes two kinds of quality information

Page 10: Can ISO 19157 support current NASA data quality metadata?

ECHO Quality Information

ECHO includes two kinds of quality information

QAStats – Standard measures for all productsQAPercentMissingData - Granule level % missing data. This attribute can be repeated for individual parameters within a granule. QAPercentOutOfBoundsData – Granule level % out of bounds data. This attribute can be repeated for individual parameters within a granule. QAPercentInterpolatedData – Granule level % interpolated data. This attribute can be repeated for individual parameters within a granule. QAPercentCloudCover – This attribute is used to characterize the cloud cover amount of a granule. This attribute may be repeated for individual parameters within a granule. (Note - there may be more than one way to define a cloud or it's effects within a product containing several parameters; i.e. this attribute may be parameter specific)

Page 11: Can ISO 19157 support current NASA data quality metadata?

ECHO Quality Information

ECHO includes two kinds of quality information

QAFlags – Classes of quality measures with product specific implementations AutomaticQualityFlag – The granule level flag applying generally to the granule and specifically to parameters the granule level. When applied to parameter, the flag refers to the quality of that parameter for the granule (as applicable). The parameters determining whether the flag is set are defined by the developer and documented in the Quality Flag Explanation. AutomaticQualityFlagExplanation – A text explanation of the criteria used to set automatic quality flag, including thresholds or other criteria. OperationalQualityFlag – The granule level flag applying both generally to a granule and specifically to parameters at the granule level. When applied to parameter, the flag refers to the quality of that parameter for the granule (as applicable). The parameters determining whether the flag is set are defined by the developers and documented in the Operational Quality Flag Explanation. OperationalQualityFlagExplanation – A text explanation of the criteria used to set operational quality flag; including thresholds or other criteria. ScienceQualityFlag – Granule level flag applying to a granule, and specifically to parameters. When applied to parameter, the flag refers to the quality of that parameter for the granule (as applicable). The parameters determining whether the flag is set are defined by the developers and documented in the Science Quality Flag Explanation. ScienceQualityFlagExplanation – A text explanation of the criteria used to set science quality flag; including thresholds or other criteria.

Page 12: Can ISO 19157 support current NASA data quality metadata?

ECHO Data Quality Measures

DQ_MeasureReference

+ measureIdentification [0..1] : MD_Identifier+ nameOfMeasure [0..*] : CharacterString + measureDescription [0..1] : CharacterStringif measureIdentification is not provided, then nameOfMeasure shall be provided

+ measure 0..1

ISO 19157 data quality measure references identify measures in several ways and provides a brief description of the measure.

<<Abstract>>DQ_Element

+ measure [0..*] : DQ_MeasureReference + evaluationMethod [0..1] : DQ_EvaluationMethod+ result [0..1] : DQ_Result

QAPercentMissingDataQAPercentOutOfBoundsDataQAPercentInterpolatedDataQAPercentCloudCoverAutomaticQualityFlagOperationalQualityFlagScienceQualityFlag

AutomaticQualityFlagExplanationOperationalQualityFlagExplanationScienceQualityFlagExplanation

Page 13: Can ISO 19157 support current NASA data quality metadata?

Data Quality Measures

DQM_Measure

+ measureIdentifier : MD_Identifier + name : CharacterString + alias [0..*] : CharacterString+ sourceReference [0..*] : CI_Citation + elementName [1..*] : TypeName + definition : CharacterString + description [0..1] : DQM_Description+ valueType : TypeName + valueStructure [0..1] : DQM_ValueStructure + example [0..*] : DQM_Description

DQM_BasicMeasure

+ name : CharacterString+ definition : CharacterString+ example : DQM_Description [0..1]+ valueType : TypeName

<<CodeList>>DQ_ValueStructure

+ bag+ set+ sequence+ table+ matrix+ coverage

DQM_Parameter

+ name : CharacterString+ definition : CharacterString+ description: DQM_Description [0..1]+ valueType : TypeName+ valueStructure [0..1] : DQM_ValueStructure

DQM_Description

+ textDescription: CharacterString+ extendedDescription [0..1] : MD_BrowseGraphic

DQ_MeasureReference

+ measureIdentification [0..1] : MD_Identifier+ nameOfMeasure [0..*] : CharacterString + measureDescription [0..1] : CharacterString

if measureIdentification is not provided, then nameOfMeasure shall be provided

+ measure 0..1

ISO 19157 data quality measure properties are in a separate object and provide a much more complete description of the measures.

<<Abstract>>DQ_Element

+ measure [0..*] : DQ_MeasureReference + evaluationMethod [0..1] : DQ_EvaluationMethod+ result [0..1] : DQ_Result

Page 14: Can ISO 19157 support current NASA data quality metadata?

14

ECHO Quality Measure Descriptions

AutomaticQualityFlagExplanation (2289)

66% parameter is produced correctly

7% No automatic quality assessment is performed in the PGE

5% Based on percentage of product that is good. Suspect used where true quality is not known.

4% Automatic quality determination software not yet implemented

2% QA flag explanation

ScienceQualityFlagExplanation(346)

OperationalQualityFlagExplanation(238)

35% Passed

14% Passed,parameter passed the specified operational test. Inferred Pass,parameter terminated with warnings. Failed parameter terminated with fatal errors.

11% Not Investigated

10%Passed,parameter passed the specified science test. Inferred Pass,parameter terminated with warnings for specified science test. Failed parameter terminated with fatal errors for specified science test.

10% See http://landweb.nascom.nasa.gov/cgi-bin/QA_WWW/qaFlagPage.cgi?sat=aqua for the product Science Quality status.

8% Validated, see http://disc.gsfc.nasa.gov/Aura/MLS/ for quality document

6% See http://landweb.nascom.nasa.gov/cgi-bin/QA_WWW/qaFlagPage.cgi?sat=terra for the product Science Quality status.

5%An updated science quality flag and explanation is put in the product .met file when a granule has been evaluated. The flag value in this file, Not Investigated, is an automatic default that is put into every granule during production.

Page 15: Can ISO 19157 support current NASA data quality metadata?

ISO Data Quality Results

DQ_ConformanceResult

+ specification : CI_Citation+ explanation : CharacterString+ pass : Boolean

DQ_CoverageResultDQ_DescriptiveResult

+ statement : CharacterString

<<Abstract>>DQ_Result

+ dateTime [0..1] : DateTime+ resultScope [0..1] : DQ_Scope

DQ_QuantitativeResult

+ valueType [0..1] : RecordType + valueUnit : UnitOfMeasure + value [1..*] : Record

PassedSuspectFailedNot InvestigatedInferred PassedBeing Investigated

Page 16: Can ISO 19157 support current NASA data quality metadata?

16

GCMD ECHO

ISO

YES!

Page 17: Can ISO 19157 support current NASA data quality metadata?

17

Questions?

Questions?

[email protected]

Page 18: Can ISO 19157 support current NASA data quality metadata?

Acknowledgements

This work was partially supported by contract number NNG10HP02C from NASA.

Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author and do not necessarily reflect the views of NASA or The HDF Group.