survey frame geocoding using administrative data sources

26
Survey frame geocoding using administrative data sources Mirosław Migacz Chief GIS Specialist Central Statistical Office of Poland Dublin, 2 XI 2017

Upload: others

Post on 21-Mar-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Survey frame geocoding using administrative data sources

Survey frame geocoding using administrative data sources

Mirosław MigaczChief GIS Specialist

Central Statistical Office of Poland

Dublin, 2 XI 2017

Page 2: Survey frame geocoding using administrative data sources

Survey frames

• Survey frame statistical units:• persons,

• buildings,

• enterprises,

• farms.

• Georeference – localization of statistical units:• facilitates field work for interviewers,

• facilitates survey management,

• enables survey result presentation on maps.

Page 3: Survey frame geocoding using administrative data sources

TERCVOIVODSHIP

POWIAT GMINA

NATIONAL REGISTER OF BOUNDARIES (PRG)

SIMC LOCALITYNATIONAL REGISTER OF GEOGRAPHICAL NAMES

(PRNG)

BREC STATISTICAL REGIONCENSUS ENUMERATION AREA

STATISTICAL REGIONCENSUS ENUMERATION AREA

BOUNDARIES

ULIC STREET STREET AXES

NOBC BUILDINGDWELLING

STATISTICAL ADDRESS POINTS

Spatial data in official statistics

Page 4: Survey frame geocoding using administrative data sources

• referencematerialcollection

January 2009 –December 2009

• referencematerialprocessing

January 2010 –May 2010 • address point

acquisition

January 2010 –June 2010

• address point databaseupdate

continuouslysince July 2010

4

PBA – spatial address databases

Page 5: Survey frame geocoding using administrative data sources

5

PBA vs survey frames

• PBA• locations of buildings with at least one dwelling.

• Survey frames• OBS – frame for social surveys,

• OBR – frame for agricultural surveys,

• BJS – statistical unit database (enterprises)

BJSOBR

OBS

Page 6: Survey frame geocoding using administrative data sources

SURVEY FRAME

• address data

• TERYT identifiers

SPATIAL DATA

• address points

• TERYT identifiers

GEOCODING

Survey frame geocoding

Page 7: Survey frame geocoding using administrative data sources

PBA OBS

OBR

BJSothersources

PBA vs survey frames

Page 8: Survey frame geocoding using administrative data sources

PRG OBS

OBR

BJSLPIS

complementary

Survey frames vs data sources

Page 9: Survey frame geocoding using administrative data sources

PRG

OBS

OBR

BJSLPIS

PBAFarm StructureSurvey 2016

Survey frame geocoding

Page 10: Survey frame geocoding using administrative data sources

PRG

OBS

OBR

BJSLPIS

PBA

Improvement of the use of administrative sources (ESS.VIP ADMIN)

Survey frame geocoding

Page 11: Survey frame geocoding using administrative data sources

PRG

OBS

OBR

BJSLPIS

PBA Improvement of the use of administrative sources (ESS.VIP ADMIN) - application

Survey frame geocoding

Page 12: Survey frame geocoding using administrative data sources

gmina(LAU2)

EMUiA

PRG

address pointsfor allbuildings

National Register of Boundaries(PRG)

register of:- localities- streets- addresses

Page 13: Survey frame geocoding using administrative data sources

Register of localities, streetsand addresses (EMUiA)

Address point

Locality

Street

Page 14: Survey frame geocoding using administrative data sources

Register of localities, streetsand addresses (EMUiA)

Administrativeunit

Locality

TERYT identifier – voidable

Page 15: Survey frame geocoding using administrative data sources

Register of localities, streetsand addresses (EMUiA)

Street name

Locality

TERYT identifier – voidable

Street

Page 16: Survey frame geocoding using administrative data sources

Register of localities, streetsand addresses (EMUiA)

street

address

locality

Page 17: Survey frame geocoding using administrative data sources

Register of localities, streetsand addresses (EMUiA)

Gmina Locality Loc.ID

Street Str.ID

Addr. #

X Y

name Węgorzyno X Kolejowa X 1 281563,44 636550,11

Address point

Gmina ID Locality Loc. ID

X Węgorzyno 0980062

Locality

Gmina ID Locality/Loc. ID

Street Str. ID

X X Kolejowa 08828

Street

Page 18: Survey frame geocoding using administrative data sources

EMUiA – problems

• multiple localities with the same name within one voivodeship

• multiple street names with the same name withinone voivodeship / gmina / locality

• typing errors

• completeness issues

Page 19: Survey frame geocoding using administrative data sources

EMUiA – solutions

• assign gmina ID to localities:• pairing by locality ID

with TERYT localityregister

• spatial join

• assign gmina ID to address points:• spatial join

• assign locality ID to address points:• pair by both: gmina

ID and locality name

Page 20: Survey frame geocoding using administrative data sources

EMUiA – solutions

• assign street ID to address points:• pairing by street

name with the streetfeature class

• pairing with TERYT street catalogue […]

Page 21: Survey frame geocoding using administrative data sources

Pairing w/ TERYT street catalogue

LocalityID (SIMC)

StreetID

Street name variations:NAZWA_1ULICA_1: NAZWA_2 + NAZWA_1ULICA_2: NAZWA_1 + NAZWA_2ULICA_3: CECHA + NAZWA_2 + NAZWA_1

(CECHA + NAZWA_1 if NAZWA_2 IS NULL)

Page 22: Survey frame geocoding using administrative data sources

Pairing w/ TERYT street catalogue

Street name variations:NAZWA_1ULICA_1: NAZWA_2 + NAZWA_1ULICA_2: NAZWA_1 + NAZWA_2ULICA_3: CECHA + NAZWA_2 + NAZWA_1

(CECHA + NAZWA_1 if NAZWA_2 IS NULL)

Pairing by:• SIMC + NAZWA_1• SIMC + ULICA_1• SIMC + ULICA_2• SIMC + ULICA_3• NAZWA_1• ULICA_1• ULICA_2• ULICA_3

Locality ID(SIMC)

Street ID

Page 23: Survey frame geocoding using administrative data sources

Pairing w/ TERYT street catalogue

• pairing addresses with streetcatalogue by street names (string)

• multiple matches -> multiplyingaddress point records

• result: 13 635 270 matched addresspoint records (initial number of address points: 7 533 868),

• 275 453 (3,6%) out of 7 533 868 address points with a street namepresent but no street ID assigned,

Page 24: Survey frame geocoding using administrative data sources

Survey frame geocoding

• agricultural survey frame: a bit more than half of recordsqualified for pairing (identifierspresent) acquired georeference,

• other survey frames: Q4 2017, Q1 2018

Page 25: Survey frame geocoding using administrative data sources

Conclusions on source data

• hope for data quality improvement over time (the PRG dataset tested is dated 13.06.2016),

• other techniques for record matching in order to assign identifiers to more address points:• building an address locator for ArcGIS geocoding tools,

• string distance analyses (e.g. stringdist Python module).

Page 26: Survey frame geocoding using administrative data sources

Mirosław MigaczChief GIS SpecialistCentral Statistical Office of Poland

@mireslav

www.linkedin.com/in/migacz

[email protected]

www.slideshare.net/MirosawMigacz

Survey frame geocoding using administrative data sources