euroregionalmap: best practices in quality assessment for a pan-european dataset

37
EuroRegionalMap: Best practices in quality assessment for a pan- European dataset Nathalie Delattre QKEN meeting, Brussels, 5-7 may 2010

Upload: yoshiko-emiko

Post on 03-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

EuroRegionalMap: Best practices in quality assessment for a pan-European dataset. Nathalie Delattre QKEN meeting, Brussels, 5-7 may 2010. Items. ERM: presentation Best Practices in quality control Quality issues Expectation-Debate. 1. ERM: presentation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Nathalie DelattreQKEN meeting, Brussels, 5-7 may 2010

Page 2: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Items

1. ERM: presentation

2. Best Practices in quality control

3. Quality issues

4. Expectation-Debate

Page 3: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

1. ERM: presentation

Page 4: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Project status: consolidation phase 2007-2010 Eurostat Contract

1. to provide a yearly update of the ERM data for a European coverage in accordance with the EC contract No. 2006/S 174-185902

2. to improve the level of harmonisation of ERM in the data content and selection criteria

3. To upgrade ERM according to EUROSTAT specifications orientated for spatial analysis purpose

Page 5: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Evolution of ERM towards EC requirements

Theme Mapping Background location

Help for location

Spatial analysis

Routing

BND X X X X

HYDRO X X X X

TRANS X X X X X

NAMET X

POP X X X

MISC X X X X

VEG X X

Page 6: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

ERM level of progress

• Release 2.2 (Jan 2008)• 31 countries : EU26, 4 EFTA, Moldova• Croatia: administrative boundaries

• Release 3.0 (Jan 2009)+ Croatia : Railway network

+ Isle of Man

+ Faeroe Islands• No update or improvement from Italy

( VMap data sources)• No data from Bulgaria

• Release 3.1 (Jan 2010)Adm, transports, settlements, names

• Release 3.2 ( Dec 2010)• Hydro

+ Bulgaria

Page 7: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Production work flow

Deliverables: • national component of the ERM data : draft

version (GDB or shapefiles)

• Validation report

• national component of the ERM data : draft version

• Validation report

• National components of the ERM data: final version

• Metadata + lineage files• Final reception ( sending approval )

Countries

Countries

RC

Data Production Own Quality Control

Quality Control

Corrections andEdge Matching

RC Quality Control ( also on edge-matching

CountriesLast corrections

Production phase

Validation phase

Page 8: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Integration phase

Task:Finishing the edge-matching at cross border area by

integrating the duplicated features located on international boundaries into one single feature

Deliverables: • ERM data set in File GDB • Metadata for ERM

Task:• Adding land mask feature• Merging the ferry lines into a seamless and

consistent network usable for spatial analysis• Setting up UIC code for railways

Deliverables: • ERM data set in File GDB, fit for EC • Metadata for Eurostat

• Quality assessment report

PM

PM

Data integration into a seamless coverage

Specific processes for specific features asked by

EC

Integration phase

Page 9: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

2. Quality control : best practices for a pan-European dataset

Page 10: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Quality control

1. Validation process : checking the conformity with the ERM specifications

2. Quality assessment process: reporting on data content and data harmonisation in selection criteria

Page 11: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Validation specifications

• Compliance with the ERM Specifications

• Data model

• Topology

• Allowed attribute values

• Selection criteria

• Geometrical resolution

• Coherence and consistency of feature and attributes

• Homogeneity of attribute values in a feature network

• Consistency between themes

• Cross-border continuity between neighbouring countries

Minimum Requirements

Page 12: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

ERM Data production

Validation by producer Validation by RC

Report about validation results

If errors exist

• To ensure best data quality

Validation process

Page 13: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Validation deliverables

Documentation:

My ERM documentation

D41_ERMSpecificationDC_v43.pdf

D51_DataValidationSpecifications_V40.pdf

D52_DataValidationSpecifications_MinReq_v12.pdf

ICC_ERM_ValidationReport_template.xls

ERM_v31_Validation_Tools_v10.xls

Page 14: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Quality indicators in Metadata

• Metadata for discovery (standard ISO 19115) : • ERM_Metadata_partners_template.xls

• Lineage files ( data quality)• ERM Lineage Template.doc• ERM_Lineage.xls

Page 15: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Quality indicators

1. Existence (ID1) = presence/absence of feature or attribute

Def: the feature or attribute information exists in the real world context and has been captured ( presence) or not captured (absence) in the ERM data set.

Values: • Presence : indicator ID1 = 1 • Absence : indicator ID1 = 0• N_A: indicator = -1 ( the feature/attribute doesn’t exist in the

real world context)

Page 16: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Existence for Austria

Theme name

Feature class name

Feature Code and Attribute Name

Feature Name and Attribute description

Obligation

Existence (ID1) Comments

ID1=[0,1,-1]BND PolbndA FA001 Administrative Area M 1

EBM0 Sabe Hierarchical Number M 1EBM1 Sabe Hierarchical Number M 1EBM2 Sabe Hierarchical Number M 1EBM3 Sabe Hierarchical Number M 1EBM4 Sabe Hierarchical Number M -1EBM5 Sabe Hierarchical Number M -1TAA Type of administrative area M 1

HYDRO CoastA BA010 Foreshore M -1HYDRO CoastL BA010 Coastline Shoreline M -1

Hierarchical level 4 and 5 doesn’t exist in Austria ID1 = -1

Foreshore and coastline doesn’t exist in AustriaID1 = -1

Page 17: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Existence for Spain•

Foreshore not entering in the selection criteria ID1 = -1

Shoreline exist but have not been captured : ID1 = 0

HYDRO CoastA BA010 Foreshore M -1 not entering into selection criteriaMCC Material Composition Category M -1NAMA1 Name in first national language (ASCII)O -1NAMA2 Name in second national language (ASCII)O -1NAMN1 Name in first national language O -1NAMN2 Name in second national languageO -1NLN1 3-Char Langage Code O -1NLN2 3-Char Langage Code O -1

HYDRO CoastL BA010 Coastline Shoreline M 1HYDRO CoastL BB081 Shoreline Construction O 0

HOC Hydrographical Origin Category O 0

Page 18: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Quality indicators (2)

1. Completeness (ID2) group of indicators1. Selection compliancy (ID2.1) for features

2. Data Completeness (ID2.2) for attributes

Selection compliancy : features are captured for the entire territory and in accordance to the portrayal and selection criteria of the specifications

Values

ID2.1 = 1 ( fully compliant)

ID2.1 = 0 ( not fully compliant)

Page 19: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Quality indicators (3)

1. Completeness (ID2) group of indicators1. Selection compliancy (ID2.1) for features

2. Data Completeness (ID2.2) for attributes

Data Completeness : % of the populated attributes holding real values ( null values like UNK or N_P are not considered)

Value: %

Ex: value for RTN• Number of features with RTN <> [UNK] = 34000• Number of total features = 45000

• ID2.2 = [ROUNDUP (34000/45000) * 100] = 76%

Page 20: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Example: Completeness for road and island

Theme name

Feature class name

Feature Code and Attribute Name

Feature Name and Attribute description

Obligation

Existence (ID1) Comments

Completeness (ID2) Comments

Improvement in data quality from the previous release

Improvement in data qualityexpected for next release

ID1=[0,1,-1]ID2.1 = [0,1] ID2.2 =

[0-100]%

HYDRO IslandA BA030 Island M 1 0areas less than 0.6 km2 have been captured

NAMN1 Name in first national language M 1 80NAMN2 Name in second national languageM -1NAMA1 Name in first national language (ASCII)M 1 80NAMA2 Name in second national language (ASCII)M -1NLN1 3-Char Language Code M 1 100NLN2 3-Char Language Code M -1

TRANS RoadL AP030 Road M 1 1EXS Existence Category M 1 100LLE Location Level M 1 100

LTN Lane/Track Number M 1 75 local roads have LTN unknownMED Median Category M 1 100NAMN1 Name in first national language O 0NAMN2 Name in second national languageO 0 not evaluatedNAMA1 Name in first national language (ASCII)O 0NAMA2 Name in second national language (ASCII)O 0 not evaluatedNLN1 3-Char Language Code O 0NLN2 3-Char Language Code O 0 not evaluatedRST Road Surface Type M 1 100RSU Seasonal Availability O 1 100RTE Route Number (Int.) M 1 100

RTN Route Number (Nat.) M 1 75 local roads have RTN unknownRTT Route Intended Use M 1 100TOL Toll Category O -1TUC Transportation Use Category M 1 100 TUC has been newly populated

TRANS RunwayL GB055 Runway M

Page 21: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Metadata on not provided information

Attribute value Attribute Type

Null/No Value Unknown Unpopulated Not Applicable

Meaning in the real world context

Information cannot be applied

Information is missing

Information exists but has not been collected

Information doesn’t exist

Text N/A UNK N_P N_A Integer Coded -32768 0 997 998 Integer Actual Value -32768 -29999 -29997 -29998

Page 22: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Quality tools

PLTS Data Reviewer (Knowledgebase)

• Automated validation of attribute domains as well as combinations of attributes• Validation of minimum dimensions

GDB Topology • Validation of topology •Not all relationsships can be defined

ERM Scripts (python) • Validation of generalisation degree, • Attribute completeness

Visual control • Necessary as not all checks can be automated (e.g. feature density)

Page 23: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Python Scripts in ERM Toolbox

• Edgematching• Check Edgematching for lines

• Check Edgematching for points

• ERM QC• Check Multipart

• Feature Statistics

• Item Statistics

• Populate Symbol Number

• Summary Statistics

• Test ASCII fields

• Export• Export to Shape

Page 24: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Statistics tools

• Feature Statistics• the number of features / featureclasse• use:

• QA - presence of feature classes and country codes• supports to fill the metadata (lineage.doc)

Page 25: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Statistics tools

• AllStatistics• ID1= the existence of the feature and attribute {0,1}• ID2 = the completeness of the feature and attribute {0,..,100}• use: supports to fill the metadata (lineage.xls)

Page 26: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Statistics tools

• GeomStat• the number of the features per unit Area (10km2, 100km2, etc.)• use: QA – density of features -> base for harmonization of selection criteria between

countries

CZSK

RO

MDHU

WatrcrsL

(Natural)315

12 32

10 km

10 k

m

Page 27: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Statistics tools

• GeomStat

Page 28: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Geometry tools

• MinVertexDistance• check the minimum allowed distance between vertices (50 m)• use: QA - data quality requirements

46 m

Correction needed !

WatrcrsL

Page 29: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Quality issues

Page 30: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Quality requirements

1. Compliancy with a standard (ERM specifications)

2. Topological errors

usable topological network

3 Completeness in attributes Ex : Name completions

4 Data harmonisation between countries • in selection criteria • in classification• in geometrical accuracy ( vertices density)

Page 31: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Quality issues : Transport

• Heterogeneity in national classification of the roads ( primary secondary, etc..)

Page 32: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Quality issues: Hydro

•Heterogeneity in selection criteria

Page 33: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Quality issues: Hydro

•Name completion (selected in blue the non-named rivers)

Page 34: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Quality issues:Hydro

•River hierachical level : must be consistent at European level ( in blue rivers with national hirerachical level)

Page 35: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Expectations

Page 36: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Expectations

1. Need of a quality control manager1. Assess quality of the data

2. Suggest new methodology and improvement in Quality control tools

3. Provide a quality assessment report of each release

2. ESDIN framework (the near future for ERM): 1. what kind of quality data model for the pan-European products

2. What kind of validation tools and quality control ?

3. Commitment of the Quality KEN ? Support welcome, which kind?

Page 37: EuroRegionalMap: Best practices in quality assessment for a pan-European dataset

Debate : quality data model? For which kind of data?

• Quality control applicable to base level datasets• Related to real world phenomena

• Quality control applicable to generalised and derived datasets ( at medium scale level)?• Added factor of selection criteria

• Quality control applicable to pan-European datasets?• Added factor of harmonisation between countries.