small area estimation - small area methods for
TRANSCRIPT
SmallAreaEstimation1-Smallareaestimation problem2- Estimation fordomains - Directestimators –
estimation forplanned domains3– Coefficient ofVariation andMinimumlevel of
precision4- Estimation forunplanned domains and/or
where thesamplesize is not enough fortheminimumlevel ofprecision – Indirect estimators
Recap
• Targetparameters inthedomain:
• Totalofthestudy variable• Mean ofthestudy variable• Atrisk ofpoverty rate• Poverty gap
3
FromStatisticsCanada (2010).Guide totheLabour ForceSurveySAEmethods:
StatisticsCanadaapplies thefollowingguidelinesonLFSdatareliability:-ifCV≤16.5%thendirectestimates aredisseminatedwithout restrictions (norelease)-if16.5%<CV≤33%thentheestimates shouldbeaccompaniedbywarning(releasewithcaveat)-ifCV>33%thentheestimatesarenotrecommended foruse(norelease)
These thresholdmaybegoodforlabour marketvariables.
Whendirectestimatesarenotreliable?
WhatifwearedealingwithestimateswithCV>33%?
4
Ifadomainsamplecannotsupportdirectestimatesofadequateprecision,oftenindirectestimateisusedto“borrowstrength”byusingvaluesfromrelatedareasand/ortimeperiodstoincreaseeffectivesamplesize.
Thesevaluesareusedintheestimationthroughamodel(implicitlyorexplicitly)thatprovidesalinktorelatedareasand/ortimeperiodsthroughsupplementaryinformationsuchasrecentcensuscountsorcurrentadministrativerecordsrelatedtothevariableofinterest.
Not direct DomainEstimates – Indirect estimates
5
Itispossibletodividetheindirectestimatesinto:
— DomainIndirect:usesvaluesfromanotherdomainbutnotfromanothertime.— TimeIndirect:usesvaluesfromanothertimebutnotfromanotherdomain.— DomainandTimeIndirect:usesvaluesbothfromanotherdomainandtime.
Indirect Domain Estimates
SAE: Borrowing Strength from?
6
8
Wecandistinguishtheindirectmethods into:
— SyntheticEstimator: anestimator iscalledasyntheticestimator ifareliabledirectestimator foralargearea, coveringseveral smallareas, isusedtoderiveanindirectestimator forasmallareaunder theassumption thatthesmallareashavethesamecharacteristics asthelargearea.Asaconsequence, thevarianceofasyntheticestimator islowerthanthecorresponding directestimatorbutisbiasedifthemodelassumptions arenotsatisfied.— Composite Estimator: itisalinearcombinationbetween adirectestimatorandasyntheticestimator.Forthisreason itrepresents agoodcompromise intermsofefficiencybetween thecharacteristics ofthetwocomponents.
Indirect Domain Estimates
Synthetic estimator
Synthetic estimator
Broad AreaRatioEstimator(BARE)
Synthetic estimator
Broad AreaRatioEstimator(BARE)withauxiliary data
Ratioestimator, Regression estimator
Compositeestimator
:HTestimator orsimple ratioestimator
:regression estimator
Thechoice oftheweight
Thechoice oftheweight
Comparison Between Direct,SyntheticandCompositeEstimator
Empirical comparison of small area estimation methods for the Italian Labor Force Survey (LFS)
• Performance of small area estimators are studied by simulating sample selection from 1981 Population Census.
• Samples are 400samplereplicates (h),each ofidentical sizeoftheLFSsampledrawn following the LFS design (two stages with stratication)
• Design-based properties of the estimators: verification of itsBias and calculation of the CV of the estimator
Tzavidis,Salvati,Marchetti(MOOC2020)
Comparison Between Direct,SyntheticandCompositeEstimator
Empirical comparison of small area estimation methods for the Italian Labor Force Survey (LFS)
• Activepopulation,aged 15-64- annual averages - According tothedefinitions ofthe InternationalLabour Organisation (ILO)forthepurposes ofthelabour marketstatistics people areclassified as employed,unemployed andeconomicallyinactive.Theeconomically active population is thesumofemployed andunemployed persons.Inactive persons arethose who,during thereference week,were neither employed nor unemployed.
• Example - 14Health ServiceAreas (HSA)oftheFriuliVeneziaGiuliaRegion areconsidered tobesmallareas
• Yi – true mean ofthey variable inthedomain i (HSA)– annual average ofactivepopulation (known fromtheCensus)
Tzavidis,Salvati,Marchetti(MOOC2020)
Comparison Between Direct,SyntheticandCompositeEstimator
Comparison Between Direct,SyntheticandCompositeEstimator
19
Furtherwehave,dependingonthe“reference framework” forinference:
— DesignBasedApproach: Estimatorproperties areassessed withrespecttothesamplingdesign(see previousexample).Thisframework isusedforsmallareaestimation,mainlybecauseofitssimplicity.
—ModelAssistedApproach: Inpractice, thevaluesofYaretypicallydefinedbyassumingamodelforthedistributionofYgivenX.Thatis,practitionershavebeenwillingtousemodels inordertoidentifyoptimalstrategies forestimatingTY.However, theirassessment ofthesestrategies remaindesign-based (Särndal,Swensson andWretman,1992).
— ModelBasedApproach:design-unbiasedness isnolongerarequirement,thealternative propertywerequireoftheestimatorunder thisapproachisthatitbemodel-unbiased giventhesampleSandauxinfoX.
Inference framework
Undirect estimation
Synthetic estimator(BARE,ratio,regression)
Composite estimator
Calibration estimators GREG
21
Model Based Approach
Twotypesofmodels depending onthedata:arealevel modelandunit level model.
— AreaLevelModel:arealevelmodelsareappropriate ifonlyarea-levelsummarydataavailablefortheauxiliaryand/ortheresponse variables. Itispossible totakeintoaccountthesamplingweights intomodel.
— UnitLevelModel:unitlevelmodels aregenerallytopreferifunit-specificinformation isavailable.Theyusually ignorethesurveyweights.
22
Auxiliary information in Area Level Approach
Whenonlysummarydata(suchasthetraditional surveyestimator) fortheresponsevariable isavailableatthesmallarealevel:
23
Auxiliary information in Unit Level Approach
Whendataforboththeresponse variableandauxiliaryvariablesareavailableattheunit level:
• Examples ofapplication oftheSynthetic andcompositeestimatorduring theR lab