conversions from national grid data to harmonized european grid data efgs lisbon 12-14 october 2011...
TRANSCRIPT
Conversions from national grid data to harmonized European grid data
EFGS Lisbon 12-14 October 2011Production and challenges
Rina Tammisto, Senior Statistician, Statistics Finland
Marja Tammilehto-Luode, Chief Adviser, Statistics Finland
Harmonization
Data harmonization Source data
Georeferenced national data
Disaggregated European data
Methods used Aggregated Disaggregated Hybrid method
Spatial harmonization A grid net
covers the whole of Europe
ETRS89-LAEA Grid Net
Downloadable ZIPhttp://www.efgs.info/data/GEOSTAT-1km-Grid.zip/viewGrid_ETRS89_LAEA_1K.shpAbt. 500 Mt
statistik.atSeite 1
Vergleich beider Systeme in LCC
LCC in LCCLAEA in LCCLCC in LCCLAEA in LCC
statistik.atSeite 1
Vergleich beider Systeme in LCC
LCC in LCCLAEA in LCCLCC in LCCLAEA in LCC
LAEA grid net in relation to national grid net in Finland
LAEA grid net in relation to national grid net in Austria
Differences in locations of grid cells in different projections (or co-ordinate systems)
A grid cell produced by using the national ETRS89-TM35FIN co-ordinate system and projection is divided among several ETRS89-LAEA grid cells
Direct derivation between different co-ordinate systems or projection is not usable
grids are located differently in relation to each others
A issue to be solved: How to use national grid datasets while the direct conversion is not relevant…?
Tested method 1. Aggregation of grid data by using converted building points
1) Georeferenced source data is convertedBuildings are converted from ETRS89-TM35FIN to
ETRS89-LAEA 2) Converted building points are joined with the
ETRS89-LAEA grid net
3) Aggregation of statistical data
Method 1
Advantages Points easily convertible –
original quality of location maintained
From geostatistical point of view data quality throughly the same as in national data
Disadvantages Double sets of primary
data Double production
processes from the beginning
Risk of data disclosure – due to use of several co-ordinate systems - gaps between datasets
Tested method 2. Conversion of grid data by using ready-made national grid datasets
1) Ready-made national grid dataset in ETRS89-TM35FIN is converted into ETRS89-LAEA
Polygon to Point – using the middle points of national grid cells
Conversion of the middle points of grids 2) Converted points are joined with the ETRS89-LAEA
grid net 3) Aggregation of statistical data
PRODUCTION OF THE NATIONAL GRID DATA
MIDDLE POINTS OF NATIONAL GRIDS CONVERSION OF THE POINTS, SPATIAL JOIN WITH ETRS89-LAEA GRID NET
AGGREGATION OF STATISTICAL DATA
Effects of the grid cell size on the quality of the conducted data
Tested grid cell sizes:
National grid data:
- 125 m x 125 m – highest resolution data
- 250 m x 250 m
- 1 km x 1 km
Reference data: Data produced by using method 1; (conversion made on building points)
Additional test: JRC/GISCO disaggregated data
– data produced for the Finnish Grid Database
ETRS89-LAEA from 125 m grids
ETRS89-LAEA from 1 km grids
ETRS89-LAEA from 250 m grids
ETRS89-LAEA from building points
POP/KM²
Comparison of the test datasets Statistics:
Number of grids, mean (inhabitants/grid populated grid cell), total number of inhabitants in the dataset, min, max
Variable N Mean Sum Minimum MaximumDataset from converted building points POP_1KM_LAEA 102 050 51,0 5 204 192 1 14 053Datasets from converted grid points POP_1KM_125M 102 249 50,9 5 204 192 1 14 197
POP_1KM_250M 102 759 50,6 5 204 166 1 13 283POP_1KM_1KM 99 049 52,5 5 204 179 1 19 175
JRC dataset POP_DISAGG 159 921 32,4 5 181 806 0.01 5 866
Coefficients
Prob > |r| under H0: Rho=0
Number of Observations
Pearson Correlation Coefficients POP_1KM_ POP_1KM_ POP_1KM_ POP_1KM_ POP_ LAEA 125M 250M 1KM DISAGG POP_1KM_LAEA 1.00000 0.99900 0.99495 0.90989 0.79804POP_1KM_LAEA <.0001 <.0001 <.0001 <.0001 102 050 99 372 97 216 81 647 85737 POP_1KM_125M 0.99900 1.00000 0.99471 0.90990 0.79857POP_1KM_125M <.0001 <.0001 <.0001 <.0001 99372 102249 97488 81808 85871 POP_1KM_250M 0.99495 0.99471 1.00000 0.90611 0.79840POP_1KM_250M <.0001 <.0001 <.0001 <.0001 97216 97488 102759 82185 86268 POP_1KM_1KM 0.90989 0.90990 0.90611 1.00000 0.74920POP_1KM_1KM <.0001 <.0001 <.0001 <.0001 81647 81808 82185 99049 82069 JRC dataset POP_DISAGG 0.79804 0.79857 0.79840 0.74920 1.00000POP_DISAGG <.0001 <.0001 <.0001 <.0001 85737 85871 86268 82069 159921
Dataset from converted building points
Dataset from converted grid points
125 m
1 km
250 m
Dis.agg.
Evaluation of differences by using absolute values of inhabitants/km² grid cell (absolute values of differences)
Identity line (the 45 degree line)
Values of converted dataset in relation to values of national datasets
ETRS89-LAEA
from 125 m grids
ETRS89-LAEA
from 250 m grids
ETRS89-LAEA
from 1 km grids
ETRS89-LAEA from building points
ETRS89-LAEA from building points
ETRS89-LAEA from building points
ETRS89-LAEA from building points
ETRS89-LAEA
disaggregate data
GRIDS Std Dev DIF 0 DIF 1-5 DIF 6-10DIF 11-
20DIF 21-
50DIF 51-
100DIF 101-
500DIF 501-
1000DIF over
1000
125M 99 372 12,7 65 305 25 428 4 429 1 924 1 447 503 335 1
% 65,7 25,6 4,5 1,9 1,5 0,5 0,3 0,0
% 91,3
250M 97 216 28,9 50 742 32 008 7 105 3 156 2 170 1 033 940 56 6
% 52,2 32,9 7,3 3,2 2,2 1,1 1,0 0,1 0,0
% 85,1
1KM 81 647 135,5 20 194 31 351 11 606 7 839 4 903 1 888 3 000 574 292%
24,7 38,4 14,2 9,6 6,0 2,3 3,7 0,7 0,4%
63,1
DISAG 85 737 184,8 11 395 36 260 14 294 9 244 6 477 2 916 4 113 632 406
% 13,3 42,3 16,7 10,8 7,6 3,4 4,8 0,7 0,5
% 55,6
DIFFERENCES (abs.values) between method 1 data (from LAEA buildings) to derived datasets
DIFFERENCES (abs.values) between method 1 data (from LAEA buildings) to JRC/GISCOdisaggregated data
Method 2
Advantages Use of the ready-made
grid datasets! Less phases Smaller data mass
Level of quality is a matter of choice
Adequate level of quality (?) Dependent on use Min. target: SUM of the
whole dataset is correct No increase of confidentiality
problems with double datasets
Disadvantages Geostatistical point of view
data quality is weaker than the original national data
Quality errors – quality distortion compared to the correct one (measuring by number of inhabitants)
Next steps
For GEOSTAT 1A project from October - November 2011
More tests, any volunteers?Quality definitions concerning adequate level of
quality and grid scale usedStep-by-step guidelines
LAEA dataset – filling the empty grid net with data!