analysis of quality metadata in the geoss clearinghouse

27
Analysis of the Quality Metadata in GEOSS Clearinghouse QUAlity aware VIsualisation for the Global Earth Observation system of systems SEVILLANO Eva 1 , DÍAZ Paula 2 , NINYEROLA Miquel 1 , MASÓ Joan 2 , ZABALA Alaitz 1 , PONS Xavier 1 1 UAB Universitat Autònoma de Barcelona. 2 CREAF Centre for Ecological Research and Forestry Applications.

Upload: paula-diaz

Post on 30-Jun-2015

62 views

Category:

Data & Analytics


0 download

DESCRIPTION

Díaz, P., Masó, J., Sevillano, E., Ninyerola, M., Zabala, A., Serral, I., Pons, X. (2012). Analysis of quality metadata in the GEOSS Clearinghouse. International Journal of Spatial Data Infrastructures Research. Vol 7 (2012), pp. 352-377.

TRANSCRIPT

Page 1: Analysis of quality metadata in the GEOSS Clearinghouse

Analysis of the Quality Metadata in

GEOSS Clearinghouse

QUAlity aware VIsualisation for the Global Earth Observation

system of systems

SEVILLANO Eva1, DÍAZ Paula 2, NINYEROLA Miquel1, MASÓ Joan2 , ZABALA Alaitz1, PONS Xavier1

1 UAB Universitat Autònoma de Barcelona.2 CREAF Centre for Ecological Research and Forestry Applications.

Page 2: Analysis of quality metadata in the GEOSS Clearinghouse

Objectives

• To get a first analysis of the data quality in the Clearinghouse

• Analyze the quality contained in the metadata (ISO 19115)– Quality indicators

www.geoviqua.org

– Quality indicators– Lineage– Usage

• Start building components for the GEO Portal– Quality Broker– Quality searcher– Quality visualization

Page 3: Analysis of quality metadata in the GEOSS Clearinghouse

Methodology

97203 XML documents

CSW

GEOSS Clearinghouse

www.geoviqua.org

• Harvest all XML documents, ISO 19115. (October 2011)

Page 4: Analysis of quality metadata in the GEOSS Clearinghouse

Methodology

97203 XML documents

Database

GestBD

Xpathextraction

CSW

GEOSS Clearinghouse

www.geoviqua.org

• Massive extraction of MD quality elements– Quality indicators– Lineage– Usage

Page 5: Analysis of quality metadata in the GEOSS Clearinghouse

Overall Results

• Total metadata records in the Clearinghouse– 97203

• Total number of quality indicators– 52187

www.geoviqua.org

– 52187

• Metadata records with quality indicators– 19107

• Metadata records with lineage– 10899 (9261 process, 3771

source)

• Metadata with usage– 1226

Page 6: Analysis of quality metadata in the GEOSS Clearinghouse

Quality Scope

• 19.66% Metadata records with quality indicators– 2.7 quality indicator per metadata record

www.geoviqua.org

Page 7: Analysis of quality metadata in the GEOSS Clearinghouse

1. Quality indicators

• 19.66% Metadata records with quality

www.geoviqua.org

Page 8: Analysis of quality metadata in the GEOSS Clearinghouse

Quality indicators

• 19.66% Metadata records with quality– 2.7 QI/MD

www.geoviqua.org

Page 9: Analysis of quality metadata in the GEOSS Clearinghouse

Quality indicators

www.geoviqua.org

Page 10: Analysis of quality metadata in the GEOSS Clearinghouse

Quality Indicators in IDEC Metadata

Quality indicators – Comparison Clearinghouse - IDEC

www.geoviqua.org

PositionalAccuracy95.38%

ThematicAccuracy

2.60%

Completeness0.46%

TemporalAccuracy

0.06%Logical

Consistency0.02%

Quality Indicators in IDEC Metadata

GEOSS

IDEC

Page 11: Analysis of quality metadata in the GEOSS Clearinghouse

Quality indicator result

85.8% (22275 QI)

14.18% (3669 QI

mainly conformance to INSPIRE)

www.geoviqua.org

0.02% (5 QI)

19115-2 Extension for "per pixel"

quality

Page 12: Analysis of quality metadata in the GEOSS Clearinghouse

Quality indicator result

www.geoviqua.org

Page 13: Analysis of quality metadata in the GEOSS Clearinghouse

Quality indicators - Quantitative

1000012000140001600018000

Nu

mb

er

of

qu

ali

ty e

lem

en

ts

Quality elements - Quantitative measures

www.geoviqua.org

02000400060008000

10000

Nu

mb

er

of

qu

ali

ty e

lem

en

ts

Complete value

Declare value

Quantitative type

Page 14: Analysis of quality metadata in the GEOSS Clearinghouse

Quality indicators - Qualitative

600800

100012001400

Nu

mb

er

of

qu

ali

ty e

lem

en

ts

Quality elements - Conformance measures

www.geoviqua.org

0200400600

Nu

mb

er

of

qu

ali

ty e

lem

en

ts

Conformance to specification

Declare conformance

Conformance type

Page 15: Analysis of quality metadata in the GEOSS Clearinghouse

Coverage result (ISO19115-2 extension)

• Clearinghouse record ID: 273234, 273232, 273233, 273235, 273236)• Only 5 records use this. Bad news for visualizing data + quality maps

• Title: OMNO2e:OMI Column Amount NO2:ColumnAmountNO2CS30

<gmd:DQ_QuantitativeAttributeAccuracy><gmd:measureDescription>

<gco:CharacterString>The 'version 003' product is the second public rele ase. It is based on improved radiance calibration. For details, please see docum ent:

www.geoviqua.org

radiance calibration. For details, please see docum ent: http://disc.sci.gsfc.nasa.gov/Aura/OMI/OMTO3e_v003. shtml </gco:CharacterString>

</gmd:measureDescription><gmd:result><gmi:QE_CoverageResult >

<gmi:spatialRepresentationType><gmd:MD_SpatialRepresentationTypeCode codeList="./resources/codeList.xml#MD_SpatialRepresentationTy peCode " codeListValue="grid ">grid </gmd:MD_SpatialRepresentationTypeCode></gmi:spatialRepresentationType>

<gmi:resultFile gco:nilReason="missing " /> <gmi:resultFormat>

<gmd:MD_Format><gmd:name><gco:CharacterString>CF-netCDF</gco:CharacterString></gmd:name></gmd:MD_Format>

</gmi:resultFormat></gmi:QE_CoverageResult ></gmd:result>

</gmd:DQ_QuantitativeAttributeAccuracy>

Page 16: Analysis of quality metadata in the GEOSS Clearinghouse

2. Lineage

www.geoviqua.org

Page 17: Analysis of quality metadata in the GEOSS Clearinghouse

2. Lineage

www.geoviqua.org

Page 18: Analysis of quality metadata in the GEOSS Clearinghouse

2. Lineage

www.geoviqua.org

Page 19: Analysis of quality metadata in the GEOSS Clearinghouse

LI_ProcessStep with LI_Source Example

Clearinghouse record ID 131007 (simplified)• Compile survey input data from the best and most cu rrent survey records.

– BLM database of the index to all official (microfilm, CD, other) BLM survey records.– USFS survey records.– Private land surveyor records– GCDB Data Collection Attribute Definitions Version 2.0, Appendix A, 2/14/1991. Survey records

used - source abbreviations.• Compile listings of known locations of PLSS corners .

– USGS topographic quadrangles and other sources.– USC&amp;GS published coordinate data.– NGS published coordinate data.– BLM global positioning Data.– USFS global positioning data.

• Coordinates of control stations are entered into a control data base with associated reliabilities.

• Topologically correct GIS coverages are modified to use FGDC compliant naming

www.geoviqua.org

• Topologically correct GIS coverages are modified to use FGDC compliant naming conventions and then loaded into the LSI database. These layers can then be downloaded as shapefiles through the LSI website.

• GCDB Data was downloaded for Kiowa and Cheyenne Cou nties, Colorado .– C:\f\gis_data\sand\zipped\kiowa\twnshp.shp.xml

• Metadata imported and data was exported from region s format to shapefile format• Dataset copied .

– C:\f\gis_data\sand\data\basedata\plss\ck_gcdb_region_township• Source Contribution: Survey data in the form of off icial (microfilm, CD, other) survey

and BLM, abstracted into a vector digital format.on line• Source Contribution: Survey and control data from t he Cartographic Feature File

(CFF) data set.disc• Source Contribution: Digitized control data from st andard topological quadrangle

sheets.disc

Page 20: Analysis of quality metadata in the GEOSS Clearinghouse

LI_Lineage: LI_Source

• 6.02% metadata records (5851) contain direct list of the data sources.– 1.85% (1798) with temporal extent

class LI_Source_only

LI_Lineage

+ statement :CharacterString [0..1]+ scope :DQ_Scope [0..*]

constraints{"source" role is mandatory if LI_Lineage.statement and "processStep" role are not documented}

Metadata Information::MD_Metadata

+resourceLineage0..*

www.geoviqua.org

• Gives credit (attribution, and eventually some trust on them)

• If quality indicators are not provided for the dataset, the quality indicators from sources can be a clue.

LI_Source

+ description :CharacterString [0..1]+ sourceSpatialResolution :MD_Resolution [0..1]+ sourceReferenceSystem :MD_ReferenceSystem [0..1]+ sourceCitation :CI_Citation [0..1]+ sourceMetadata :CI_Citation [0..*]+ scope :DQ_Scope [0..*]

constraints{"description" is mandatory if "scope" is not documented}{"scope" is mandatory if "description" is not documented}

and "processStep" role are not documented}{"processStep" role is mandatory if LI_Lineage.statement and "source" role are not documented}

+source 0..*

Page 21: Analysis of quality metadata in the GEOSS Clearinghouse

LI_Lineage: LI_ProcessStep

• 8.26% metadata records (8035) contain the direct list of the processes without sources– 292 (0.30%) contain date

class From_LI_ProcessStep_to_LI_Source

LI_Lineage

+ statement :CharacterString [0..1]

+ scope :DQ_Scope [0..*]

constraints

Metadata Information::MD_Metadata

+resourceLineage0..*

www.geoviqua.org

• With the order of these processes.

• If quality indicators are not provided for the dataset, it’s difficult to infer resource quality with only a process list

LI_ProcessStep

+ description :CharacterString

+ rationale :CharacterString [0..1]

+ stepDateTime :TM_Primitive [0..*]

+ processor :CI_ResponsiblePartyInfo [0..*]

+ reference :CI_Citation [0..*]

+ scope :DQ_Scope [0..*]

constraints

{"source" role is mandatory if LI_Lineage.statement

and "processStep" role are not documented}

{"processStep" role is mandatory if

LI_Lineage.statement and "source" role are not

documented}

+processStep 0..*

Page 22: Analysis of quality metadata in the GEOSS Clearinghouse

Complete Provenance:MD_ProcessStep with MD_Source

• 1.26% metadata records (1226 ) with more complete provenance process .

• How and when the data sources where used

class From_LI_ProcessStep_to_LI_Source

LI_Lineage

+ statement :CharacterString [0..1]+ scope :DQ_Scope [0..*]

constraints{"source" role is mandatory if LI_Lineage.statement and "processStep" role are not documented}

Metadata Information::MD_Metadata

+resourceLineage0..*

www.geoviqua.org

sources where used

• If quality indicators are not provided for the dataset, we can deduce which sources have more influence in the quality of the final result

LI_Source

+ description :CharacterString [0..1]+ sourceSpatialResolution :MD_Resolution [0..1]+ sourceReferenceSystem :MD_ReferenceSystem [0..1]+ sourceCitation :CI_Citation [0..1]+ sourceMetadata :CI_Citation [0..*]+ scope :DQ_Scope [0..*]

constraints{"description" is mandatory if "scope" is not documented}{"scope" is mandatory if "description" is not documented}

LI_ProcessStep

+ description :CharacterString+ rationale :CharacterString [0..1]+ stepDateTime :TM_Primitive [0..*]+ processor :CI_ResponsiblePartyInfo [0..*]+ reference :CI_Citation [0..*]+ scope :DQ_Scope [0..*]

and "processStep" role are not documented}{"processStep" role is mandatory if LI_Lineage.statement and "source" role are not documented}

+processStep 0..*

+source

0..*

Page 23: Analysis of quality metadata in the GEOSS Clearinghouse

Complete provenance in ISO19115-2

• LI_ProcessStep includes a LE_Processing that has a runTimeParameters attribute that allows us describing the exact list of parameters used in the execution.

• There is a citation of the algorithm used (LI_Algorithm).

class From_LE_ProcessStep_to_LE_Source

LI_Source

+ description :CharacterString [0..1]+ sourceSpatialResolution :MD_Resolution [0..1]+ sourceReferenceSystem :MD_ReferenceSystem [0..1]+ sourceCitation :CI_Citation [0..1]+ sourceMetadata :CI_Citation [0..*]+ scope :DQ_Scope [0..*]

constraints{"description" is mandatory if "scope" is not documented}{"scope" is mandatory if "description" is not documented}

LI_ProcessStep

+ description :CharacterString+ rationale :CharacterString [0..1]+ stepDateTime :TM_Primitive [0..*]+ processor :CI_ResponsiblePartyInfo [0..*]+ reference :CI_Citation [0..*]+ scope :DQ_Scope [0..*]

LI_Lineage

+ statement :CharacterString [0..1]+ scope :DQ_Scope [0..*]

constraints{"source" role is mandatory if LI_Lineage.statement and "processStep" role are not documented}{"processStep" role is mandatory if LI_Lineage.statement and "source" role are not documented}

Metadata Information::MD_Metadata

+processStep 0..*

+resourceLineage

0..*

www.geoviqua.org

used (LI_Algorithm).• All these extensions were done for

the benefit of the EO gridded data, but there are not in the Clearinghouse.

• We can completely evaluate the quality of the resulting product if we know the uncertainties that sources have in their metadata (sourceMetadata citation in LI_Source).

From ISO 19115-2:2009 shown for informative purposes only

Data quality information - Imagery::

LE_ProcessStep

Data quality information - Imagery::LE_ProcessStepReport

+ name :CharacterString+ description :CharacterString [0..1]+ fi leType :CharacterString [0..1]

If "LE_NominalResolution.scanningResolution" is usedthen "LE_Source.scaleDenominator" is required

Data quality information - Imagery::LE_Source

+ processedLevel :MD_Identifier [0..1]+ resolution :LE_NominalResolution [0..1]

Data quality information - Imagery::LE_Processing

+ identifier :MD_Identifier+ softwareReference :CI_Citation [0..1]+ procedureDescription :CharacterString [0..1]+ documentation :CI_Citation [0..*]+ runTimeParameters :CharacterString [0..1]

Data quality information - Imagery::LE_Algorithm

+ citation :CI_Citation+ description :CharacterString

«Union»Data quality information - Imagery::

LE_NominalResolution

+ scanningResolution :Distance+ groundResolution :Distance

"description" is mandatory if "sourceExtent" is not documented

"sourceExtent" is mandatory if "description" is not documented

+report 0..*

+output

0..*

+processingInformation0..1

+algorithm 0..*

Page 24: Analysis of quality metadata in the GEOSS Clearinghouse

3. Usage - User feedback

www.geoviqua.org

• There is one small entry for user feedback in the current ISO-19115:

• MD_Usage– Brief description of ways in which the

resource is currently or has been used

Page 25: Analysis of quality metadata in the GEOSS Clearinghouse

• There are 1.2% (1133) entries– SpecificUsage and – UserContactInfo, only

• All made by the same institution !:

MD_Usage - User feedback

www.geoviqua.org

– Landesvermessung und GeobasisinformationBrandenburg (LGB)

– Tel +49-331-8844-123, Fax. +49-331-8844-16123 – Heinrich-Mann-Allee 103, Potsdam, Brandenburg 14473,

Deutschland– [email protected] – http://www.geobasis-bb.de

Page 26: Analysis of quality metadata in the GEOSS Clearinghouse

Conclusions

• There are many different kinds of quality indicators– There is a lack of a complete description of values provided (no units, missing

measure name, missing evaluation method)

• Quality coverage results (by pixel) are almost inexistent and the the link is not there

• Lineage information is rich in many records, some with more that 100 entries in source or ProcessSteps

www.geoviqua.org

entries in source or ProcessSteps• We have usage examples -> Feedback• Current data is enough to demonstrate search and visualization with

some limitations. Good for GeoViQua.

• Next steps:

– Assess the Quality of Quality Metadata?

– Extend this analysis to other capacity catalogues integrated in the EuroGEOSSBroker

Page 27: Analysis of quality metadata in the GEOSS Clearinghouse

Thank you!Danke!Grazie!Merci!

Ευχαριστίες!

Diolch!Bedankt!

Köszönöm!Ačiū!

Благодарам! Спасибі!Ευχαριστίες!Vďaka!

Tak!Díky!

Tänan!Kiitos!

Благодарам!Dzięki!

Mulţumiri!Хвала!Tack!

Teşekkürler!

Спасибі!Спасибо!Obrigado!

Takk!Gràcies!Gracias!

QUAlity aware VIsualisation for the Global Earth Observation

system of systems