using unequal probability sampling to limit antici-pated variances of regression estimators
DESCRIPTION
Using unequal probability sampling to limit antici-pated variances of regression estimators. Anders Holmberg Department of Research & Development Statistics Sweden SE-701 89 Örebro Sweden Tel: +46 19 176905 Fax: +46 19 177084 E-mail: [email protected]. Anders HolmbergICES III 07. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/1.jpg)
Using unequal probability sampling to limit antici-pated variances of regression estimators
Anders Holmberg ICES III 07
Anders Holmberg
Department of Research & Development
Statistics Sweden
SE-701 89 Örebro
SwedenTel: +46 19 176905
Fax: +46 19 177084
E-mail: [email protected]
![Page 2: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/2.jpg)
Outline
• Background
• The problem• Some theory• Auxiliary Information
• An application in a business survey• Comparisons and Results
• CommentsAnders Holmberg ICES III 07
![Page 3: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/3.jpg)
Background(1)
Anders Holmberg ICES III 07
• Prepare the sampling frame
• Derive and analyse diagnostic data
• Decide on a sampling design, sampling scheme and estimator
• Launch the survey
![Page 4: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/4.jpg)
Background(2)
Anders Holmberg ICES III 07
• Prerequisites– A well defined business population– Several parameters of interest – Design-based inference– An up-to-date frame from the business
register– Admin. data available as auxiliary
information– Attempt to find the most efficient/(robust)
design
![Page 5: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/5.jpg)
Background(6)
Anders Holmberg ICES III 07
(1) Number of employees (u1)
(2) Turnover (u2)
(3) Personnel expenses (u3)
(4) Investments (u4)
(t-2)
(1) Number of employees (u1)
(2) Turnover (u2)
(3) Personnel expenses (u3)
(4) Investments (u4)
(t-1)
(1) Number of employees (y1)
(2) Turnover (y2)
(3) Personnel expenses (y3)
(4) Investments (y4)
(t)
(1) Number of employees (u1)
(2) Turnover (u2)
(3) Personnel expenses (u3)
(4) Investments (u4)
(t-1)
![Page 6: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/6.jpg)
A design that minimizes
is such that
( )( . . )k qk k q opt k qk qkUi e n
Minimum of ˆ( )qq y rANV t is
2 2min
1ˆ( )qq y r qk qkU U
ANV tn
Brewer, Hajek, Cassel et al., Rosén
Optimal design in the single variable case
Anders Holmberg ICES III 07
21)1()ˆ( qkU kyq q
tANV
![Page 7: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/7.jpg)
Population plot
Anders Holmberg ICES III 07
nypers2
0
20
40
60
80
100
120
0 10 20 30 40 50 60 70 80 90 100
E.g. if : 22
)( qkqk u
qkk u
~~qkqkk u’Guesstimate’ to find
size measures
![Page 8: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/8.jpg)
The multivariate case?
Anders Holmberg ICES III 07
(1) Number of employees (u1)
(2) Turnover (u2)
(3) Personnel expenses (u3)
(4) Investments (u4)
(t-2)
(1) Number of employees (u1)
(2) Turnover (u2)
(3) Personnel expenses (u3)
(4) Investments (u4)
(t-1)
(1) Number of employees (y1)
(2) Turnover (y2)
(3) Personnel expenses (y3)
(4) Investments (y4)
(t)
(1) Number of employees (u1)
(2) Turnover (u2)
(3) Personnel expenses (u3)
(4) Investments (u4)
(t-1)
![Page 9: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/9.jpg)
The multivariate case
Anders Holmberg ICES III 07
The least we should do is to analyse the various designs’ possible effects on different estimators, before we make the design choice.
Derive inclusion probabilities as a function of standardized (univariate) size measures
Maximal Brewer selection
![Page 10: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/10.jpg)
The multivariate case
Anders Holmberg ICES III 07
There is no evident criterion of optimality, but some are better than others.
),,1())ˆ(( QqvtANVg qyq
)))ˆ((,)),ˆ(((1 Qyy tANVgtANVgf Minimize
under the restrictions
Try to find a design that in some sence is optimal for all important parameters?
![Page 11: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/11.jpg)
Scale effects are neutralized, the relations between the ANVq :s and the corresponding single parameter minimum values (The Brewer selection) are used .
Anders Holmberg ICES III 07
The multivariate case some optimisation approaches
Minimizing a weighted sum of relative efficiency losses:
Q
q yq
pyq
q
q
iq
tANV
tANVHANOREL
1 min )ˆ(
)ˆ(
is minimized when
Q
qU qkkoptq
qkqk H
121
)(
2
~)1~(
~
![Page 12: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/12.jpg)
If we want to put restrictions on certain parameters, e.g.
21
1 21 ( )
min ( ) ( 1) ( 1)
subject to
Qqk
q kUq q opt k qkU
f H
π
Optimization model:
0
21
1 2( )
0 1 1, ,
0
1 1, ,( 1)
k
kU
qkq qkU
q opt k qkU
k N
g n
g v q Q
π
π
min
ˆ( )1, ,
ˆ( )q i
q
q y r pq
y rq
ANV tv q Q
ANV t
Then a design that minimizes ANOREL can be obtained through non-linear programming
Anders Holmberg ICES III 07
The multivariate case some optimisation approaches
![Page 13: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/13.jpg)
An Application
Anders Holmberg ICES III 07
The 4 variables studied for three branches (strata)
SNI25: Manufacturers of food products & beverages
N=749,
SNI28: Manufacturers of metal goods (except machines and devices)
N=2292,
SNI33: Manufacturers of optical instruments
N=323,
290)( snE
112)( snE
64)( snE
Analysis and comparisons made on admin data from previous reference times. Plots, Estimated correlations and gammacoefficients
![Page 14: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/14.jpg)
An Application
Anders Holmberg ICES III 07
• A common ratio model pictures the relationships reasonably well if the corresponding older variable is used as regressor variable. (Strongest pairwise correlation over branches and time, although doubts exist for the investment variable)
• Estimates of the gammacoefficient are sensitive.
• Estimates ranged between 0.2 and 0.9 and sometimes deteriorated!?
• For investments very weak or no heteroscedasticity
• For the other three variables,
“cannot be ruled out” and is simple as a guesstimate
5.0~
![Page 15: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/15.jpg)
An Application
Anders Holmberg ICES III 07
5.0
studyvariable Food Metal Optic
employees 0.5 0.5
turnover 0.5 0.5 0.5
P-costs 0.5 0.5 0.5
investment 0.2 0 0.2
StrataAuxiliary/
size variable
~ ~
)(~
qkkoptq u
~
1ku ~
1ku~
1ku
~
2ku~
2ku~
2ku
~
3ku ~
3ku~
3ku
~
4ku~
4ku ~
4ku
![Page 16: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/16.jpg)
An Application
Anders Holmberg ICES III 07
• Computations of inclusion probabilities and the anticipated variances using the Brewer selection (Maximal brewer selection)
• Computation of the optimisation based approaches, with the extra condition that
15.1)ˆ(
)ˆ(
min
q
iq
yq
pyq
tANV
tANV
![Page 17: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/17.jpg)
1p
2p
3p
4p
5p
6p
Study variables
Considered Design Employees Turnover P-cost Invest Mean
Opt. on Empl 0 24.3 3.5 24.4 13.0
Opt. on Turn 24.5 0 19.1 74.4 29.5
Opt. on P-cost 3.3 16.4 0 43.0 13.0
Opt. on Invest 34.4 91.7 45.9 0 43.0
Minimizing Anorel 2.8 13.9 2.9 19.5 9.8
Minimizing Anorel with restrictions 5.7 15.0 6.5 15.0 10.6
Food & Beverages ]1))ˆ()ˆ([(100 min qiq yqpyq tANVtANV
![Page 18: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/18.jpg)
1p
2p
3p
4p
5p
6p
Study variables
Considered Design Employees Turnover P-cost Invest Mean
Opt. on Empl 0 13.0 3.4 30.6 11.8
Opt. on Turn 11.0 0 8.2 51.6 17.7
Opt. on P-cost 3.0 8.3 0 37.1 12.1
Opt. on Invest 44.7 73.3 53.4 0 42.8
Minimizing Anorel 2.9 7.7 3.1 20.1 8.5
Minimizing Anorel with restrictions 4.5 10.9 5.3 15.0 8.9
Optical Instruments ]1))ˆ()ˆ([(100 min qiq yqpyq tANVtANV
![Page 19: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/19.jpg)
1p
2p
3p
4p
5p
6p
Study variables
Considered Design Employees Turnover P-cost Invest Mean
Opt. on Empl 0 7.3 2.0 21.0 7.6
Opt. on Turn 6.1 0 4.3 33.0 10.9
Opt. on P-cost 1.8 5.0 0 24.4 7.8
Opt. on Invest 31.6 51.2 36.1 0 29.7
Minimizing Anorel 1.7 4.9 1.9 14.0 5.6
Minimizing Anorel with restrictions 3.4 7.0 4.0 10.0 6.1
Metal goods ]1))ˆ()ˆ([(100 min qiq yqpyq tANVtANV
Maximal Brewer selection satisfies the criteria but with 25% larger sample
3.364)( snE
![Page 20: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/20.jpg)
Does it work on the estimator variances?
Anders Holmberg ICES III 07
• In most cases we will never know
• However, for these variables we can check against admin. data (coming in 1.5 year later)
• Using
• Where is the Taylor expanded variance of the ratio estimator under poisson sampling
( )
*
ˆ( )100( 1)
ˆ( )PO q q i
q
T y r p
q y r
V t
V t
( )
1 2ˆ( ) 1PO qT y r k qkU
V t E
![Page 21: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/21.jpg)
1p
2p
3p
4p
5p
6p
Study variables
Considered Design Employees Turnover P-cost Invest Mean loss
Opt. on Empl 0 19 9 72 25
Opt. on Turn 9 0 36 67 28
Opt. on P-cost 8 6 0 68 21
Opt. on Invest 146 146 222 24 134
Minimizing Anorel 2 2 31 0 9
Minimizing Anorel with restrictions 10 8 45 8 18
Metal goodsRatios of the Taylor expanded variances to the smallest variance of each estimator (%)
![Page 22: Using unequal probability sampling to limit antici-pated variances of regression estimators](https://reader035.vdocuments.us/reader035/viewer/2022062305/56814e28550346895dbb8f21/html5/thumbnails/22.jpg)
Summary
• Carefully choosing appropriate size measures to get limits anticipated variances of
regression estimators. And Brewer’s results can be extended to a multivariate situation.
• If there is a multivariate issue and you intend to use auxiliary information in the design, diagnostic computations are important.
• With an optimization approach we know what we are aiming to minimize and with the non-linear programming approach some practical trouble in designing a pps-sample are avoided.
Anders Holmberg ICES III 07
qkk ~