1 multiple frame surveys tracy xu kim williamson department of statistical science southern...

23
1 Multiple Frame Multiple Frame Surveys Surveys Tracy Xu Tracy Xu Kim Williamson Kim Williamson Department of Statistical Science Department of Statistical Science Southern Methodist University Southern Methodist University

Upload: hilary-reed

Post on 16-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

11

Multiple Frame Multiple Frame SurveysSurveys

Tracy Xu Tracy Xu Kim WilliamsonKim WilliamsonDepartment of Statistical ScienceDepartment of Statistical ScienceSouthern Methodist UniversitySouthern Methodist University

22

Multiple Frame SurveysMultiple Frame Surveys

Introduction Introduction

– – What is Multiple Frame SurveyWhat is Multiple Frame Survey Different estimators for population totalDifferent estimators for population total Variance Estimators for those estimatorsVariance Estimators for those estimators ConclusionConclusion ReferencesReferences

33

IntroductionIntroduction

Hartley (1962)Hartley (1962) Multiple frame surveys refers to two or more Multiple frame surveys refers to two or more

frames that can cover a target populationframes that can cover a target population Very useful for sampling rare or hard-to-Very useful for sampling rare or hard-to-

reach populationsreach populations Dual frame design may result in considerable Dual frame design may result in considerable

cost savings over a single frame design with cost savings over a single frame design with comparable precisioncomparable precision

44

Example 1 – Cost ReductionExample 1 – Cost Reduction

Agriculture [Hartley 1962, 1974]Agriculture [Hartley 1962, 1974]+ + List frame (incomplete, names, addresses)List frame (incomplete, names, addresses)

- Less costly- Less costly

+ Area frame (complete, insensitive to + Area frame (complete, insensitive to changes)changes)

- Expensive to sample- Expensive to sample

+ Can achieve the same precision+ Can achieve the same precision

Linear Cost FunctionLinear Cost FunctionC = nC = nAAccAA + n + nBBccBB

55

Example 2 – Rare Example 2 – Rare PopulationsPopulations

AIDS [Kalton and Anderson 1986]AIDS [Kalton and Anderson 1986]+ + Using a general population frame as well as Using a general population frame as well as

std clinics, drug treatment centers, and std clinics, drug treatment centers, and hospitalshospitals

Homeless [Iachan and Dennis 1993]Homeless [Iachan and Dennis 1993]+ + Frames: homeless shelters, soup kitchens, Frames: homeless shelters, soup kitchens,

and street areasand street areas Alzheimer’s Alzheimer’s

+ + Frames: general population and adult day-Frames: general population and adult day-care centerscare centers

66

Issues to ConsiderIssues to Consider

Statisticians must address the Statisticians must address the following issuesfollowing issues+ + How should the information from theHow should the information from the

samples be combined to estimate samples be combined to estimate

population quantities?population quantities?

+ How should variance estimates be+ How should variance estimates be

calculated? calculated?

77

NotationsNotations

Universe Universe UU = = AA UU B = a U ab U bB = a U ab U b N=# of elements in the populationN=# of elements in the population

NNAA= # of elements in Frame A = # of elements in Frame A

NNBB = # of elements in Frame B = # of elements in Frame B

NNaa = # of elements in Frame A, but not Frame B = # of elements in Frame A, but not Frame B

NNbb = # of elements in Frame B, but not Frame A = # of elements in Frame B, but not Frame A

NNabab = # of elements in Frame A & Frame B = # of elements in Frame A & Frame B

SSAA = P{ i = P{ ithth element is in S} = element is in S} = ππAAii

Y = population total = YY = population total = Yaa + Y + Ybb + Y + Yabab

88

EstimatorsEstimators

Hartley (H)Hartley (H) Fuller and Burmeister (FB)Fuller and Burmeister (FB) Single Frame estimatorsSingle Frame estimators Pseudo-Maximum Likelihood Pseudo-Maximum Likelihood

(PML)(PML)

99

Hartley & FB EstimatorHartley & FB Estimator

Minimizes the variance among the class of Minimizes the variance among the class of linear unbiased estimators of Ylinear unbiased estimators of Y

Have minimum variance for a single responseHave minimum variance for a single response Use different set of weights for each response Use different set of weights for each response

variablevariable Disadvantages: Increased amount of Disadvantages: Increased amount of

calculations (uses covariances estimated by calculations (uses covariances estimated by the data) and possible inconsistenciesthe data) and possible inconsistencies

Estimators are not in general linear functions Estimators are not in general linear functions of of yy

FB has the greatest asymptotic efficiency FB has the greatest asymptotic efficiency

1010

Hartley & FB EstimatorHartley & FB Estimator

YH YY )0(ˆ)(ˆ

Bab

AabY YY ˆˆ

TFB YY )0(ˆ)(ˆ

1111

Single Frame EstimatorsSingle Frame Estimators

Bankier (1986), Kalton & Anderson (1986) Bankier (1986), Kalton & Anderson (1986) and Skinner (1991)and Skinner (1991)

Treat all observations as if they had been Treat all observations as if they had been sampled from a single frame with modified sampled from a single frame with modified weights for observations in the intersections weights for observations in the intersections of framesof frames

Do not use any auxiliary information about Do not use any auxiliary information about the population totalsthe population totals

Linear in Linear in yy Other techniques may be applied: Regression Other techniques may be applied: Regression

Estimation and Ranking Ratio Estimation Estimation and Ranking Ratio Estimation

1212

Pseudo-Maximum Likelihood Pseudo-Maximum Likelihood EstimatorEstimator

Skinner and Tao (1996) derived pseudo-Skinner and Tao (1996) derived pseudo-ML(PML) estimator for dual frame survey that ML(PML) estimator for dual frame survey that use the same set of weights for all items of y, use the same set of weights for all items of y, similar to “single frame” estimators, and similar to “single frame” estimators, and maintain efficiency.maintain efficiency.

The idea of pseudo-MLE estimation is talked The idea of pseudo-MLE estimation is talked about in Roberts, Rao, Kumar (1987) and about in Roberts, Rao, Kumar (1987) and Skinner, Holt, and Smith (1989) in which a Skinner, Holt, and Smith (1989) in which a MLE estimator under simple random MLE estimator under simple random sampling is modified to achieve consistent sampling is modified to achieve consistent estimation under complex designs. estimation under complex designs.

1313

The main advantages of PMLE are that it is The main advantages of PMLE are that it is design consistent and typically has a simple design consistent and typically has a simple form. form.

The potential disadvantage is that it may not The potential disadvantage is that it may not be asymptotically efficient, although it may be asymptotically efficient, although it may be hoped that any loss of efficiency will tend be hoped that any loss of efficiency will tend to be small in practice.to be small in practice.

Pseudo-Maximum Likelihood Pseudo-Maximum Likelihood EstimatorEstimator

1414

Pseudo-MLE of Y is derived asPseudo-MLE of Y is derived as

and is the smallest root of the quadratic equationand is the smallest root of the quadratic equation

]''ˆ'ˆ/[]''ˆ''ˆ'ˆ'ˆ[ˆ

ˆˆ'')ˆ('ˆ)ˆ(ˆ ,,,

abB

Bab

A

Aabab

B

Babab

A

Aab

abPMLabbPMLabBaPMLabAPML

NN

nN

N

nN

N

nN

N

n

where

NNNNNy

Pseudo-Maximum Likelihood Pseudo-Maximum Likelihood EstimatorEstimator

abBabAABBA

BA

NnNnNnNnq

nnp

where

rqxpx

''ˆ'ˆ

,

02

PMLabN ,ˆ

1515

Extensive simulation was done to evaluate the Extensive simulation was done to evaluate the performance of all the estimators in Sharon Lohr and J. performance of all the estimators in Sharon Lohr and J. N. K Rao(2005) paperN. K Rao(2005) paper

Findings: Findings: In all the simulations, the PML method had In all the simulations, the PML method had

either the smallest EMSE or an EMSE close to either the smallest EMSE or an EMSE close to the minimum value. With its high efficiency the minimum value. With its high efficiency and ease of computation, as well as the and ease of computation, as well as the practical advantage of using the same set of practical advantage of using the same set of weights for all response variables, the PML weights for all response variables, the PML method appears to be a good choice for method appears to be a good choice for estimation in multiple frame survey.estimation in multiple frame survey.

Comparison of All Comparison of All EstimatorsEstimators

1616

FindingsFindings

When Q>=3, the theoretically optimal Fuller-When Q>=3, the theoretically optimal Fuller-Burmeister and Hartley methods became Burmeister and Hartley methods became unstable, because they require solving unstable, because they require solving systems of equations using a large estimated systems of equations using a large estimated covariance matrix. covariance matrix.

Comparison of All Comparison of All EstimatorsEstimators

1717

Asymptotic VarianceAsymptotic Variance

Under some conditions, the H, FB and Under some conditions, the H, FB and PML estimators are all consistent PML estimators are all consistent estimators of the total.estimators of the total.

AndAnd

But neither H estimator or PML estimator is But neither H estimator or PML estimator is necessarily more efficient than the other.necessarily more efficient than the other.

)ˆvar(),ˆvar(

)ˆvar()ˆvar(

,

,

optFBH

optFBPML

YaoptYa

YaYa

1818

Asymptotic VarianceAsymptotic Variance

Sharon Lohr and J. N. K. Rao(2005) paper gives Sharon Lohr and J. N. K. Rao(2005) paper gives a general formula for the asymptotic variance a general formula for the asymptotic variance for all above estimators, which can be used to for all above estimators, which can be used to construct optimal designs for multiple frame construct optimal designs for multiple frame surveys.surveys.

q

Q

qqqq

Tq GFGYV

1

)(

PMLfor ])()(1[

method frame-single for the )0,[

estimator Burmeister-Fuller dgeneralize for the

estimatorHartley dgeneralizeor f ]0,[

111)(

,

,

,

PMLTT

PMLT

dq

PML

TSq

FBq

THq

Tq

diagMfMMPYdiagMff

1919

Variance EstimatorsVariance Estimators

Two Methods:Two Methods: Skinner and Rao(1996) described a method for Skinner and Rao(1996) described a method for

estimating the variance of using Taylor estimating the variance of using Taylor linearization. linearization.

Lohr and Rao(2000) defined jackknife variance Lohr and Rao(2000) defined jackknife variance estimator for estimators from dual frame surveys and estimator for estimators from dual frame surveys and showed that jackknife variance estimator is showed that jackknife variance estimator is asymptotically equivalent to the Taylor linearization asymptotically equivalent to the Taylor linearization variance estimator.variance estimator.

PMLY

2020

Simulation results ( Lohr and Rao 2000) Simulation results ( Lohr and Rao 2000) showed that in comparing the linearization showed that in comparing the linearization estimator, full jackknife and modified jackknife estimator, full jackknife and modified jackknife estimatorsestimators

1.1. The jackknife estimator has exhibited smaller bias than the The jackknife estimator has exhibited smaller bias than the linearization estimator. linearization estimator.

2.2. The relative bias of all three estimators of the variance tends The relative bias of all three estimators of the variance tends to decrease as the sample size increase.to decrease as the sample size increase.

3.3. For the smaller sample sizes, the linearization and modified For the smaller sample sizes, the linearization and modified jackknife methods underestimate the EMSE. jackknife methods underestimate the EMSE.

4.4. Coverage probabilities, though similar for the three variance Coverage probabilities, though similar for the three variance estimators, were slightly higher for the full jackknife.estimators, were slightly higher for the full jackknife.

Variance EstimatorsVariance Estimators

2121

5. The jackknife methods are less stable than the linearization 5. The jackknife methods are less stable than the linearization estimator of the variance as judged by the values of relative estimator of the variance as judged by the values of relative standard error. standard error.

6. For single frame estimator, the jackknife and linearization 6. For single frame estimator, the jackknife and linearization estimates of the variance coincide.estimates of the variance coincide.

7. For the other estimators, both the linearization and modified 7. For the other estimators, both the linearization and modified

jackknife estimates of the variance are biased downward.jackknife estimates of the variance are biased downward.

Variance EstimatorsVariance Estimators

2222

ConclusionConclusion

Multiple Frame Surveys can be extremely Multiple Frame Surveys can be extremely beneficial when sampling rare populations beneficial when sampling rare populations and when a complete frame is very and when a complete frame is very expensive to sampleexpensive to sample

Different estimators of the total are Different estimators of the total are proposed. Choice of estimators will depend proposed. Choice of estimators will depend on survey design and complexity: FB is the on survey design and complexity: FB is the most efficient, however due to additional most efficient, however due to additional calculations and complexity PML may be calculations and complexity PML may be preferredpreferred

2323

ReferencesReferences

H.O. Hartley (1974), “Multiple Frame Methodology and H.O. Hartley (1974), “Multiple Frame Methodology and Selected Applications”, Selected Applications”, Sankhya, the Indian Journal of Sankhya, the Indian Journal of Statistics, Series C, 36, 99-118.Statistics, Series C, 36, 99-118.

C. J. Skinner and J. N. K. Rao(1996), “Estimation in Dual C. J. Skinner and J. N. K. Rao(1996), “Estimation in Dual Frame Surveys with Complex Designs”, Frame Surveys with Complex Designs”, Journal of the Journal of the American Statistical Association, 91, 349-356.American Statistical Association, 91, 349-356.

Sharon L. Lohr and J.N.K. Rao(2000), “Inference from Dual Sharon L. Lohr and J.N.K. Rao(2000), “Inference from Dual Frame Surveys”, Frame Surveys”, Journal of the American Statistical Journal of the American Statistical Association, 95, 2710280.Association, 95, 2710280.

Sharon L. Lohr and J. N. K. Rao(2006), “Estimation in Sharon L. Lohr and J. N. K. Rao(2006), “Estimation in Multiple-Frame Surveys”, Multiple-Frame Surveys”, Journal of the American Statistical Journal of the American Statistical Association (under revision).Association (under revision).

J. Lessler and W. Kalsbeek (1992), Non-sampling Error in Surveys, John Wiley & Sons, Inc.