random group variance adjustments when hot deck imputation is used to compensate for nonresponse...

28
Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census Bureau Presented by Samson Adeshiyan

Upload: gabriel-walsh

Post on 02-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

Random Group Variance Adjustments

When Hot Deck Imputation Is Used to Compensate for Nonresponse

Richard A. Moore

Company Statistics Division

US Census Bureau

Presented by

Samson Adeshiyan

Page 2: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

2

2002 Survey Of Business Owners(SBO) Primary Goal

• Provide Business Ownership Statistics– State– Industry – Demographic Group

• Race --- Native American, Asian, Black, Hawaiian/Pacific Islander, White, Public

• Ethnicity --- Hispanic, Non-Hispanic• Gender --- Female, Equal, Male

Page 3: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

3

SBO Primary Publication Level Statistics

• Black-owned Grocery Stores in North Dakota (ND)– Number– Aggregate Sales– Aggregate Payroll– Aggregate Employment

Page 4: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

4

What Do We Have?(Econ Census and Tax Returns)

• 5.5 mil. companies with paid employees– Receipts, Payroll, Employment– Geographic Codes– Industry Codes

• 17.5 mil. companies without paid employees– Receipts– Industry and Geography Codes

Page 5: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

5

What Are We MissingFor Each Business?

• Race of Ownership

• Ethnicity of Ownership

• Gender of Ownership

• Obtain this from a stratified sample of 2.5 million businesses

Page 6: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

6

Distribution At the US Level23 Million Companies

• Women --- 28%• Hispanic --- 7%

• Black --- 5%• Asian --- 5%• Native American --- 1%• Hawaiian/Pacific Islander --- 0.1%

Page 7: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

7

Problem 1: Need Sufficient Representation in the SampleBlack-Owned Groceries in ND

• 2002 Estimates– 78 Black-owned businesses in ND– 15 of these in Retail– Only 4 are Grocery Stores

• Can’t list groceries in ND in random order and sample systematically

Page 8: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

8

“Modeled Guess” Codes from Admin Info For Each Company

• Response from a Previous SBO• Population Distribution by ZIP Code• State/Industry Distribution in 1997 SBO• Owner’(s) Social Security Number when

Available – Race/Hispanic/Gender Codes on SSN Application– Surnames (e.g. LOPEZ or WANG)– Country of Birth (e.g. Korea or CUBA)– Decennial Responses

Page 9: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

9

Example

• Name …. Michelle Wie’s Pro Shop

• Modeled Guess …. Asian Female

• Likelihood-Race ……. 0.8912

• Likelihood-Hisp ……. 0.0012

• Likelihood-Female …. 0.9500

Page 10: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

10

Warning: Model is not 100% accurate

• Michelle Wie’s Pro Shop – Responds As White, Non-Hispanic,Male– Tabbed As White, Non-Hispanic,Male

• If Business response is inconsistent with modeled likelihoods, tabulate by the responses

• If a business does not respond, don’t directly infer responses from likelihoods

Page 11: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

11

Problem 2:Differential Response Rates

Between Demographic Groups

Owner Likelihood-Hispanic Response

Jose Martinez 0.985 Hispanic

John Martinez 0.940 ???

Jose’s Sub Shop 0.123 Non-Hispanic

Juanita Martin 0.060 Non-Hispanic

John Martin 0.040 Non-Hispanic

Page 12: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

12

Likelihoods Aid in Non-Response Adjustment

Likelihood-Hispanic Response Weight

1 0.985 Hispanic 4.0

2 0.940 ??? 4.0

3 0.123 Non-Hispanic 4.0

4 0.060 Non-Hispanic 4.0

5 0.040 Non-Hispanic 4.0

Response Rate Adjusted Hispanic-owned Est…5.0 (4.0 * 5/4)

Hot Deck Imputed Hispanic-owned Est … 8.0 (4.0 + 4.0)

Page 13: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

13

For Variance:Random Group Replication (RG)

• Considerable number of cases where the modeled guess disagrees with the actual response– Cases tabbed from other stratum– Considerable variability in the weights of

the tabulated cases

Page 14: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

14

Likelihoods Aid in Non-Response Adjustment

Like Response Weight RGRcts

1 0.98 Hispanic 4.0 1 10

2 0.94 ??? 4.0 2 1

3 0.12 Non-Hispanic 4.0 3 5

4 0.06 Non-Hispanic 4.0 4 6

5 0.04 Non-Hispanic 4.0 5 8

Imputed Hispanic Firms Est = 8 Imp Hispanic Receipts = 44

Page 15: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

15

For variance calculation:Wt Adjustment Method

Factors on Responding Firms• Firms

– Respondents Estimate = 4– Post Impute Estimate = 8– Weight Adjustment Factor = 2.0

• Receipts– Respondents Estimate 40– Post Impute Estimate = 44– Weight Adjustment Factor = 1.1

Page 16: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

16

Oh-Scheuren Adjustment Factor (1983)

r = # respondents

i = # imputed cases

n = i + r = total number of cases

V1 = variance with impute treated as reported

V2 = V1 * (n/r + i/n)

Page 17: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

17

Oh-Scheuren MethodProblems with Comparison

• Research developed for Jackknife not Random Group

• Calculate response rates for cell

• Best response for our example– Not Missing Random– True response rate is 4 of 5– Response rate for Hispanics is 1 of 2

Page 18: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

18

Donor Imputation Method(RG # Also Donated)

Likelihood Response Weight RG Receipts

1 0.98 Hispanic 4.0 1 10

2 0.94 ??? 4.0 2 1

1 0.98 Hispanic 4.0 1 10

2 0.94 Hispanic 4.0 1 1

Imputed Hispanic Firms Est = 8

Imputed Hispanic Receipts = 44

Only RG #1 is non -zero.

Same Estimates. Higher Variances.

Page 19: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

19

Advantages of Donating RG #

• No need to add multiple factors to record

• No need to calculate factors

• No problems for microdata users

Page 20: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

20

Compare the Ratios of the Variance of the three Methods

R1 = VAR(Oh-Scheuren) / VAR (Weighted Adjustment)

R2 = VAR(Donor) / VAR (Weighted Adjustment)

Mean for R1 and R2 across publication cells

Std Dev for each of the means of R1 and R2

Null Hypothesis: Ri = 1 (90% confidence)

Page 21: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

21

Ratio of Variances --- Firm Counts

* Not Statistically Significant from 1.00 at 90%

# Imputes Oh-Sch/ Wt Donor/Wt

1 to 3 1.148 0.984*

4 to 5 1.176 0.963

6 to 9 1.136 0.941

10 to 19 1.087 1.069

20 to 49 1.069 1.205

50 or more 1.053 1.367

Page 22: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

22

Ratio of Variances --- Receipts

* Not Statistically Significant from 1.00 at 90%

# Imputes Oh-Sch/ Wt Donor/Wt

1 to 3 1.230 0.958*

4 to 5 1.286 0.876

6 to 9 1.540 0.963*

10 to 19 1.541 0.914

20 to 49 1.499 0.900

50 or more 1.512 0.951

Page 23: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

23

Ratio of Variances --- Firm Counts

* Not Statistically Significant from 1.00 at 90%

Response Rate

Oh-Sch/ Wt Donor/Wt

45 to 55% 0.930 1.193

55 to 65% 1.076 1.182

65 to 75% 1.153 1.101

75 to 85% 1.130 1.043

85 to 95% 1.153 1.032*

Page 24: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

24

Ratio of Variances --- Receipts

* Not Statistically Significant from 1.00 at 90%

Response Rate

Oh-Sch/ Wt Donor/Wt

45 to 55% 1.790 0.902

55 to 65% 1.520 0.904

65 to 75% 1.465 0.940

75 to 85% 1.218 0.945

85 to 95% 1.153 0.954

Page 25: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

25

Are the differences acceptable?

Firm Count Variance Ratios Differ by 10%

Receipts Variances Differ up to 70%

=>

Firm Count Relative SEs Differ by about 5%

Receipts Relative SEs Differ by up to 30%

Page 26: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

26

Asian-Owned Retail Operationsin New Hampshire in 2002

Estimate Published RSE

Max Change

in RSE

Firms 210 23% + 1%

Receipts $70 Mil 19% + 6%

Page 27: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

27

Lingering Question

Is the donation of the RG Number sufficient or do we need to augment the resulting variance with a factor (similar to the Oh-Scheuren factor)?

Page 28: Random Group Variance Adjustments When Hot Deck Imputation Is Used to Compensate for Nonresponse Richard A. Moore Company Statistics Division US Census

28

Any Questions?

Richard Moore

[email protected]