modeling coverage error in address lists due to …final enhanced list confirmed dsf addresses...
TRANSCRIPT
The Impact on Survey Operations and Sampling
Jizhou Fu and Lee Fiorio
Modeling Coverage Error in Address Lists Due to Geocoding Error:
AAPOR 2012, Orlando
• ABS Background• Analysis Goals• Data and Methodology• Results• Discussion • Limitations
Outline
3
• Address based frames first need geographical boundaries• Types of address-based frames
• US Postal Service Delivery Sequence File (DSF)– Purchased through market research vendors– Updated frequently– Adequate replacement for field listing in urban and suburban areas
• Dependent or Enhanced Listing– Provide DSF to listers for enhancement in the field– Reduces cost and increases accuracy of traditional lisitng
• Because of costs, DSF should be used where possible• Enhanced listing should be used where DSF is inadequate• Evaluating DSF coverage: DSF-to-Census Ratio
Address-Based Sampling (ABS) Background
4
• Geographic information on the DSF:• Address, city, county, state, zip, zip4, carrier route, walk
sequence
• Geographic information not on the DSF:• Census block, census block group, census tract, latitude or
longitude
• Geocoding • Appends latitude and longitude as well as census geography• Requires commercial software • PO Boxes and Rural Route address not easily geocoded• Potential for error
DSF Geography
5
Geocoding Error
6
7
Geocoding Error
8
Geocoding Error
9
Geocoding Error
10
Geocoding Error
11
Geocoding Error
• What are the correlates of geocoding error?• Logistic Model
– Urbanicity– Housing unit density– Vacancy rates– Drop delivery– Housing unit type (single family home, apartment)– Home ownership– Adjacent to water blocks
• Does geocoding error exhibit spatial clustering?• Moran’s I• Logistic Model
– Autocovariate
Research Questions
12
• NORC National Frame Listing effort• Fall 2011• Out of 1,516 segments (census tracts or block groups), 126 segments
needed enhancement• Device based listing
– Latitude and longitude collected– Segment level address list– Real-time QC in central office
• Selected 21 enhanced segments for analysis• Geocapture worked for at least 90% of addresses• Mix of urban and rural• Range of DSF-to Census ratios -- 0.31 to .81
• 8,560 DSF lines
Data and Methodology
13
14
Geocoding error: over-coverage vs under-coverage
Addresses added in the
field
Final enhanced list
Confirmed DSF
addresses
Unconfirmed DSF
addresses
DSF
(over-coverage) (under-coverage)(coverage)
4,8597,5041,056
• 12.3% of DSF lines unconfirmed in field
• Difficult to separate causes of under-coverage
• Focus on over-coverage
• Sample drawn of 4,000 DSF lines provided for enhancement• Dependent variable: flag if correctly geocoded into the segment• Independent variables:
• Address-level (DSF)– Drop point flag– Vacant flag– Record type indicator (High rise, rural, single family home)
• Block-level (census)– DSF-to-Census ratio – four categories(<0.9, 0.9 to 1.25, 1.25 to 2, >2)– TEA Code Flag– Type of Enumeration Area– Principal city flag– Water adjacency flag– Housing unit density– Area– Percent Multi-unit
Data and Methodology (cont’d)
15
Table 1: Logistic Model ResultsParameter Estimate
Intercept -***DSF-to-Census <0.9 +***DSF-to-Census 1.25 to 2.0 +**DSF-to-Census >2.0 +***TEA1 -***In Principal City Flag -***HU Density (mean centered) -***Drop delivery +***Vacant Flag +*Record Type High-rise -Record Type Rural +***Pct Multi-Unit (mean centered) -*Area (mean centered) +***
16
Ratio Categories
Urbanicity
Postal Characteristics
Geographical Considerations
Significance: * p<0.05, ** p<0.01, *** p<0.001
17
Table 2: A closer look at impact of DSF-to-Census Ratio
Category Parameter Odds Ratio
Signifi-cance
1 DSF-to-Census <0.9 2.25 ***
3 DSF-to-Census 1.25 to 2.0 2.37 **
4 DSF-to-Census >2.0 4.29 ***
• Addresses in category 1 census blocks have the same odds of being geocoded incorrectly as category 2
• Does geocoding error exhibit spatial clustering?• Do blocks with geocoding error neighbor blocks with
geocoding error?
y = β1x1 + β2x2 + … + βpWy + ε
• Where Wy is weighted average of neighboring values or ‘spatial lag’
18
Spatial Autocorrelation
Spatial Autocorrelation
19
1 2
3 4 5
1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0
Example Segment Example Weight Matrix W
Spatial Autocorrelation
20
1 2
3 4 5
1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0
Example Segment Example Weight Matrix W
Moran’s I – Measure of Spatial Autocorrelation
21
1 2
3 4 5
1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0
Example Segment Example Weight Matrix W
Spatial Autocorrelation
22
1 2
3 4 5
Example Segment
Error1 12 13 04 05 1
Example variable of interest y
23
1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0
y1 12 13 04 05 1
=
Wy1 12 23 14 25 1
Weight Matrix W Geocoding Error y
Spatial Autocorrelation
*
Weighted average of neighbors Wy
• Degree of linear association between observed values y and a weighted average of neighboring values Wy
• Observed: 0.0281• Very significant (p < 0.0001)• Positive, indicating possible spatial clustering
• Add Wy to final logistic model
y = Xβ1x1 + Xβ2x2 + … + XβpWy + ε
24
Moran’s I and Spatial Autocorrelation Model
Table 3: Logistic Regression with Spatial AutocovariateParameter EstimateIntercept -***DSF-to-Census <0.9 +**DSF-to-Census 1.25 to 2.0 +**DSF-to-Census >2.0 +***TEA1 -***In Principal City Flag -***HU Density (mean centered) -***Drop delivery +***Vacant Flag +*Record Type High-rise -Record Type Rural +***Pct Multi-Unit (mean centered) -*Area (mean centered) +***Autocovariate (W.y) +* 25
Ratio Categories
Urbanicity
Postal Characteristics
Geographical Considerations
Map 1: Example of Clustering
26
Map 2: Example of Clustering
27
• Urbanicity, postal characteristics, block-level DSF-to-census ratio are highly correlated with geocoding error
• Addresses in low DSF-to-Census ratio blocks have similar odds of geocoding error as addresses in high DSF-to-Census ratio blocks
• Geocoding error exhibits spatial clustering• Problematic blocks within a segments can be used as a potential
flag for larger geocoding error
• Help with address frame decisions
Discussion
28
• Analysis was limited to segments that already have less than acceptable DSF coverage
• Possible that census characteristics and DSF flags behave differently above threshold
• Sample of 21 segments used in analysis not random• Limits the ability to generalize findings
• Definition of geocoding error limited to over-coverage error
Limitations
29
Thank You!
Lee Fiorio [email protected]