modeling coverage error in address lists due to …final enhanced list confirmed dsf addresses...

30
The Impact on Survey Operations and Sampling Jizhou Fu and Lee Fiorio Modeling Coverage Error in Address Lists Due to Geocoding Error: AAPOR 2012, Orlando

Upload: others

Post on 14-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

The Impact on Survey Operations and Sampling

Jizhou Fu and Lee Fiorio

Modeling Coverage Error in Address Lists Due to Geocoding Error:

AAPOR 2012, Orlando

Page 2: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504
Page 3: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• ABS Background• Analysis Goals• Data and Methodology• Results• Discussion • Limitations

Outline

3

Page 4: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• Address based frames first need geographical boundaries• Types of address-based frames

• US Postal Service Delivery Sequence File (DSF)– Purchased through market research vendors– Updated frequently– Adequate replacement for field listing in urban and suburban areas

• Dependent or Enhanced Listing– Provide DSF to listers for enhancement in the field– Reduces cost and increases accuracy of traditional lisitng

• Because of costs, DSF should be used where possible• Enhanced listing should be used where DSF is inadequate• Evaluating DSF coverage: DSF-to-Census Ratio

Address-Based Sampling (ABS) Background

4

Page 5: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• Geographic information on the DSF:• Address, city, county, state, zip, zip4, carrier route, walk

sequence

• Geographic information not on the DSF:• Census block, census block group, census tract, latitude or

longitude

• Geocoding • Appends latitude and longitude as well as census geography• Requires commercial software • PO Boxes and Rural Route address not easily geocoded• Potential for error

DSF Geography

5

Page 6: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Geocoding Error

6

Page 7: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

7

Geocoding Error

Page 8: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

8

Geocoding Error

Page 9: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

9

Geocoding Error

Page 10: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

10

Geocoding Error

Page 11: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

11

Geocoding Error

Page 12: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• What are the correlates of geocoding error?• Logistic Model

– Urbanicity– Housing unit density– Vacancy rates– Drop delivery– Housing unit type (single family home, apartment)– Home ownership– Adjacent to water blocks

• Does geocoding error exhibit spatial clustering?• Moran’s I• Logistic Model

– Autocovariate

Research Questions

12

Page 13: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• NORC National Frame Listing effort• Fall 2011• Out of 1,516 segments (census tracts or block groups), 126 segments

needed enhancement• Device based listing

– Latitude and longitude collected– Segment level address list– Real-time QC in central office

• Selected 21 enhanced segments for analysis• Geocapture worked for at least 90% of addresses• Mix of urban and rural• Range of DSF-to Census ratios -- 0.31 to .81

• 8,560 DSF lines

Data and Methodology

13

Page 14: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

14

Geocoding error: over-coverage vs under-coverage

Addresses added in the

field

Final enhanced list

Confirmed DSF

addresses

Unconfirmed DSF

addresses

DSF

(over-coverage) (under-coverage)(coverage)

4,8597,5041,056

• 12.3% of DSF lines unconfirmed in field

• Difficult to separate causes of under-coverage

• Focus on over-coverage

Page 15: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• Sample drawn of 4,000 DSF lines provided for enhancement• Dependent variable: flag if correctly geocoded into the segment• Independent variables:

• Address-level (DSF)– Drop point flag– Vacant flag– Record type indicator (High rise, rural, single family home)

• Block-level (census)– DSF-to-Census ratio – four categories(<0.9, 0.9 to 1.25, 1.25 to 2, >2)– TEA Code Flag– Type of Enumeration Area– Principal city flag– Water adjacency flag– Housing unit density– Area– Percent Multi-unit

Data and Methodology (cont’d)

15

Page 16: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Table 1: Logistic Model ResultsParameter Estimate

Intercept -***DSF-to-Census <0.9 +***DSF-to-Census 1.25 to 2.0 +**DSF-to-Census >2.0 +***TEA1 -***In Principal City Flag -***HU Density (mean centered) -***Drop delivery +***Vacant Flag +*Record Type High-rise -Record Type Rural +***Pct Multi-Unit (mean centered) -*Area (mean centered) +***

16

Ratio Categories

Urbanicity

Postal Characteristics

Geographical Considerations

Significance: * p<0.05, ** p<0.01, *** p<0.001

Page 17: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

17

Table 2: A closer look at impact of DSF-to-Census Ratio

Category Parameter Odds Ratio

Signifi-cance

1 DSF-to-Census <0.9 2.25 ***

3 DSF-to-Census 1.25 to 2.0 2.37 **

4 DSF-to-Census >2.0 4.29 ***

• Addresses in category 1 census blocks have the same odds of being geocoded incorrectly as category 2

Page 18: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• Does geocoding error exhibit spatial clustering?• Do blocks with geocoding error neighbor blocks with

geocoding error?

y = β1x1 + β2x2 + … + βpWy + ε

• Where Wy is weighted average of neighboring values or ‘spatial lag’

18

Spatial Autocorrelation

Page 19: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Spatial Autocorrelation

19

1 2

3 4 5

1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0

Example Segment Example Weight Matrix W

Page 20: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Spatial Autocorrelation

20

1 2

3 4 5

1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0

Example Segment Example Weight Matrix W

Page 21: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Moran’s I – Measure of Spatial Autocorrelation

21

1 2

3 4 5

1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0

Example Segment Example Weight Matrix W

Page 22: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Spatial Autocorrelation

22

1 2

3 4 5

Example Segment

Error1 12 13 04 05 1

Example variable of interest y

Page 23: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

23

1 2 3 4 51 0 1 1 0 02 1 0 0 1 13 1 0 0 1 04 0 1 1 0 15 0 1 0 1 0

y1 12 13 04 05 1

=

Wy1 12 23 14 25 1

Weight Matrix W Geocoding Error y

Spatial Autocorrelation

*

Weighted average of neighbors Wy

Page 24: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• Degree of linear association between observed values y and a weighted average of neighboring values Wy

• Observed: 0.0281• Very significant (p < 0.0001)• Positive, indicating possible spatial clustering

• Add Wy to final logistic model

y = Xβ1x1 + Xβ2x2 + … + XβpWy + ε

24

Moran’s I and Spatial Autocorrelation Model

Page 25: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Table 3: Logistic Regression with Spatial AutocovariateParameter EstimateIntercept -***DSF-to-Census <0.9 +**DSF-to-Census 1.25 to 2.0 +**DSF-to-Census >2.0 +***TEA1 -***In Principal City Flag -***HU Density (mean centered) -***Drop delivery +***Vacant Flag +*Record Type High-rise -Record Type Rural +***Pct Multi-Unit (mean centered) -*Area (mean centered) +***Autocovariate (W.y) +* 25

Ratio Categories

Urbanicity

Postal Characteristics

Geographical Considerations

Page 26: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Map 1: Example of Clustering

26

Page 27: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Map 2: Example of Clustering

27

Page 28: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• Urbanicity, postal characteristics, block-level DSF-to-census ratio are highly correlated with geocoding error

• Addresses in low DSF-to-Census ratio blocks have similar odds of geocoding error as addresses in high DSF-to-Census ratio blocks

• Geocoding error exhibits spatial clustering• Problematic blocks within a segments can be used as a potential

flag for larger geocoding error

• Help with address frame decisions

Discussion

28

Page 29: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

• Analysis was limited to segments that already have less than acceptable DSF coverage

• Possible that census characteristics and DSF flags behave differently above threshold

• Sample of 21 segments used in analysis not random• Limits the ability to generalize findings

• Definition of geocoding error limited to over-coverage error

Limitations

29

Page 30: Modeling Coverage Error in Address Lists Due to …Final enhanced list Confirmed DSF addresses Unconfirmed DSF addresses DSF (over-coverage) (under-coverage)(coverage) 1,056 7,504

Thank You!

Lee Fiorio [email protected]