sjs sdi_171 design of statistical investigations stephen senn random sampling 2

24
SJS SDI_17 1 Design of Statistical Investigations Stephen Senn Random Sampling 2

Upload: trinity-obrien

Post on 28-Mar-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 1

Design of Statistical Investigations

Stephen Senn

Random Sampling 2

Page 2: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 2

Stratified Random Sampling

A stratified random sample is one obtained by separating the population elements into nonoverlapping groups, called strata, and then selecting a simple random sample from each stratum.

Scheaffer, Mendenhall and Ott,

Elementary Survey Sampling, Fourth Edition

Page 3: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 3

Why?

• Stratification can be efficient as regards estimation– Lower variances

• Consequently it may be cost-effective

• It may be desired to make statements about subgroups

Page 4: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 4

General ModelL = number of strata

Ni = number of sampling units in stratum I

N = number of sampling units in population = N1 + N2 +…NL

ni = number is sample from stratum i

etc.

Basic idea of estimation. For any stratum we can estimate the stratum total by multiplying the sample mean by the number in the population in that stratum. We then calculate the population total by summing all strata and so forth

Page 5: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 5

Estimation

1 1 2 21

22

1

22

21

1 1...

1( ) ( )

1

L

st L L ii

L

st i ii

Li

ii i

y N y N y N y yN N

V y N V yN

NN n

NB Ignoring FPCF

Page 6: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 6

Example Surv_3

• Advertising firm surveying three areas for mean weekly hours television viewing– Town A, 1550 households– Town B, 620 households– Rural area, 930 households

• Samples are taken at random within these three strata.

• Results on next slide

Page 7: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 7

Town A Town B Rural Town A Town B Rural Overall35 27 8 N 1550 620 930 310028 4 15 n 20 8 1226 49 21 mean 33.9 25.125 1941 10 7 var 35.35789 232.4107 87.6363643 15 14 Total 52545 15577.5 17670 85792.529 41 30 var contr 4247367 11167335 6316391 2173109332 25 2037 30 1136 1225 3229 34 Mean 27.731 24 Var 2.2639 SE 1.538 Bound 3.0404528273534

Page 8: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 8

Sample Size Case 1 Equal Allocation

22

21

1 2

2 22 2

2 21 1

22

21

1

... /

1 1( )

12

Li

st ii i

L

L Li i

st i ii i

Li

ii

V y NN n

Suppose n n n n L

LV y N N

nN N nL

LN

N n

Page 9: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 9

2 22 2

2 12 2

1

2 2

12 2

12 ,

4

4

L

i iLi i

ii

L

i ii

N LL

NN n N n

N Ln

N

Suppose that in planning Surv_3 we had suspected the following

2 2 225, 225, 100A B R

Page 10: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 10

Analysis of sample size determination for Surv_3Use subscripts 1 for A, 2 for B, 3 for R

ORIGIN 1

L 3 N1 1550 N2 620 N3 930

2

1 25 2 225 3 100

NT1

L

i

Ni

NT 3100

Equal allocation solution

nT

4L

1

L

i

Ni 2 i 2

NT2

2

ceilnTL

25nT 72.75

Page 11: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 11

Sample Size Case 2 Equal Proportions

2

22

1

2 22

2 21 1

2

21

22 2

2 2 21 1

1

, 1

1 1( )

12

1 14 , 4

Li

st ii i

i i

L Li i

st i ii ii

Li

ii

L Li

i i ii i

V y NN n

Suppose n rN i L

V y N NN rN N r

NN r

N r NN r N

Page 12: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 12

nT 72.75

r

4

1

L

i

Ni i 2

NT2

2

r 0.028

i 1 3

ni ceil r Ni n

44

18

27

Page 13: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 13

Sample Size Case 3 Optimal allocation

Approximate allocation that minimises cost for a given variance or minimises variance for a given cost. (ci is the cost per observation sampled in stratum i)

1

/

/

i i ii L

i i ii

N cn n

N c

This is set as an exercise to prove in the coursework

Page 14: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 14

22

21

1

2

2 12

1

1 12

1( )

/

/

/1

/

/1

Li

st ii

i i iL

i i ii

L

i i i iLi

ii i i i

L L

i i i i i ii i

V y NN

N cn

N c

N cN

N nN c

N c N c

N n

Page 15: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 15

1 1 1 122 2 2

/ /

4 , 4

L L L L

i i i i i i i i i i i ii i i i

N c N c N c N c

nN n N

(Again this is ignoring FPCF)

Page 16: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 16

Now consider information on costs

c

9

9

16

nT

4

1

L

i

Ni i ci

1

L

i

Ni i

ci

NT2

2

ni ceil nT

Ni i

ci

1

L

i

Ni i

ci

n

24

29

22

nT1

L

i

ni

nT 75

Page 17: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 17

Cluster Sampling

A cluster sample is a probability sample in which each sampling unit is a collection, or cluster, of elements

Schaeffer, Mendenhall and Ott

Example. We wish to obtain a n impression of reading skills amongst year 8 children in the UK. We select a simple random sample of schools and test each year 8 child in the schools chosen for reading skills.

Page 18: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 18

Cluster Sampling Why and Why Not?

• Why: Less costly than simple or stratified sampling per sampled unit– It may be costly to establish sample frame of

individuals– It may be cheaper to sample units close together

• Why not: For a given number of sampled units, the variance will be higher

Page 19: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 19

A Model for Cluster Sampling

N = number of clusters in population

n = number of clusters selected in a simple random sample of clusters

mi = number of elements in cluster i, i = 1,……N

1

1

, / ,

n

iNi

ii

mM m M M N m

n

Page 20: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 20

Minimum Variance EstimationGeneral Theory

Suppose that we have a series of unbiased estimators of a given parameter with known but different variances. What is the linear combination of the estimators with the minimum variance?

1 2

2

1 1

ˆ ˆ ˆ

ˆ

ˆ ˆ , 1

k

i i

k k

w i i ii i

E E E

V

w w

Page 21: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 21

2 2

1

2 21

1 1

1

1

1

21

22

ˆ

( , , ) 1

( , , )1

1

( , , )2

1/(2 )

2

k

w i ii

k k

k i i ii i

kk

ii

k

ii

ki i

i

i ii

V w

f w w w w

df w ww

d

w

df w ww

dw

w

Setting = 0 yields

Setting = 0 yields

Page 22: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 22

2

21

2

22 2 2

1 12

1

2

21

2 21 1

1

1

1

ˆ1

1 1 1

1 1

ii k

i i

k ki

w i i iki i

i i

k

k ki i

i ii i

w

V w

Page 23: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 23

Now suppose that the true cluster means have a variance but that the variance within strata is constant

2

2

b

w

Between cluster variance

Within cluster variance

22

22

2

2

2

( )

1

0,

1,

wi b

i

iw

bi

b i i

bi

w

V ym

w

m

If w m

As wn

Page 24: SJS SDI_171 Design of Statistical Investigations Stephen Senn Random Sampling 2

SJS SDI_17 24

Questions

• In the design and analysis of experiments variance estimates are often based on pooled variances. In sampling theory they generally are not. Why the difference in practice?

• For a given total number of observations how do simple, stratified and cluster sampling compare in terms of variance?