fundamentals of survey sampling - iasri.res.iniasri.res.in/ebook/tefcpi_sampling/fundamentals of...

14
1: Fundamentals of Survey Sampling 1 FUNDAMENTALS OF SURVEY SAMPLING V.K. Gupta, B.N. Mandal and Rajender Parsad Indian Agricultural Statistics Research Institute, New Delhi-110012 Introduction Developed methods of sample selection from a finite population and of estimation that provide estimates of the unknown population parameters, generally population total or population mean, which are precise enough for our purpose. Survey samples can broadly be categorized into two types: probability samples and non-probability samples. Surveys based on probability samples are capable of providing mathematically sound statistical inferences about a larger target population. Inferences from probability-based surveys may, however, suffer from many types of bias. Surveys that are not based on probability sampling have no way of measuring their bias or sampling error. Surveys based on non-probability samples are not externally valid. They can only be said to be representative of the people that have actually completed the survey. Henceforth, a sample survey would always mean a probability sampling, unless otherwise stated. Sample survey methods, based on probability sampling, have more or less replaced complete survey (or census) methods on account of several well known advantages. It is well recognized that the introduction of probability sampling approach has played an important role in the development of survey techniques. The concept of representativeness through probability sampling techniques introduced by Neyman (1934) provided a sound base to the survey approach of data collection. One of the salient features of probability sampling is that besides providing an estimate of the population parameter, it also provides an idea about the precision of the estimate (sampling error). Throughout this lecture, the attention would be restricted to sample surveys and not the complete survey. For a detailed exposition to the concepts of sample survey, reference may be made to the texts of Cochran (1977), Desraj and Chandok (1998), Murthy (1977), Sukhatme et al. (1984), Mukhopadhyay (1998). Population, sample, estimator A finite population is a collection of known number N of distinct and identifiable sampling units. If U’s denote the sampling units, the population of size N may be represented by the set N i U U U U U , , , , , 2 1 . The study variable is denoted by y having value Y i on unit i; 1, 2,..., i N . We may represent by N i Y Y Y Y , , , , , 2 1 Y an N-component vector of the values of the study variable Y for the N population units. The vector Y is unknown. Sometimes auxiliary information is also available on some other characteristic x related with the study variable Y. We may represent by N i X X X X , , , , , 2 1 X an N-component vector of the values of the auxiliary variable X for the N population units. The total N i X X X X X 2 1 is generally known. A list of all the sampling units in the population along with their identity is known as sampling frame. The sampling frame is a basic requirement for sampling from finite populations. It is assumed that the sampling frame is available and is perfect in the sense that it is free from under or over coverage and duplication.

Upload: hoangtu

Post on 03-Apr-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: FUNDAMENTALS OF SURVEY SAMPLING - iasri.res.iniasri.res.in/ebook/tefcpi_sampling/fundamentals of survey sampling.pdf · categorized into two types: probability samples and non-probability

1: Fundamentals of Survey Sampling

1

FUNDAMENTALS OF SURVEY SAMPLING

V.K. Gupta, B.N. Mandal and Rajender Parsad

Indian Agricultural Statistics Research Institute, New Delhi-110012

Introduction

Developed methods of sample selection from a finite population and of estimation that

provide estimates of the unknown population parameters, generally population total or

population mean, which are precise enough for our purpose. Survey samples can broadly be

categorized into two types: probability samples and non-probability samples. Surveys based

on probability samples are capable of providing mathematically sound statistical inferences

about a larger target population. Inferences from probability-based surveys may, however,

suffer from many types of bias. Surveys that are not based on probability sampling have no

way of measuring their bias or sampling error. Surveys based on non-probability samples are

not externally valid. They can only be said to be representative of the people that have

actually completed the survey. Henceforth, a sample survey would always mean a probability

sampling, unless otherwise stated.

Sample survey methods, based on probability sampling, have more or less replaced complete

survey (or census) methods on account of several well known advantages. It is well

recognized that the introduction of probability sampling approach has played an important

role in the development of survey techniques. The concept of representativeness through

probability sampling techniques introduced by Neyman (1934) provided a sound base to the

survey approach of data collection. One of the salient features of probability sampling is that

besides providing an estimate of the population parameter, it also provides an idea about the

precision of the estimate (sampling error). Throughout this lecture, the attention would be

restricted to sample surveys and not the complete survey. For a detailed exposition to the

concepts of sample survey, reference may be made to the texts of Cochran (1977), Desraj and

Chandok (1998), Murthy (1977), Sukhatme et al. (1984), Mukhopadhyay (1998).

Population, sample, estimator

A finite population is a collection of known number N of distinct and identifiable sampling

units. If U’s denote the sampling units, the population of size N may be represented by the

set Ni UUUUU ,,,,, 21 . The study variable is denoted by y having value Yi on unit

i; 1,2,...,i N . We may represent by Ni YYYY ,,,,, 21 Y an N-component vector of the

values of the study variable Y for the N population units. The vector Y is unknown.

Sometimes auxiliary information is also available on some other characteristic x related with

the study variable Y. We may represent by Ni XXXX ,,,,, 21 X an N-component

vector of the values of the auxiliary variable X for the N population units. The total

Ni XXXXX 21 is generally known.

A list of all the sampling units in the population along with their identity is known as

sampling frame. The sampling frame is a basic requirement for sampling from finite

populations. It is assumed that the sampling frame is available and is perfect in the sense that

it is free from under or over coverage and duplication.

Page 2: FUNDAMENTALS OF SURVEY SAMPLING - iasri.res.iniasri.res.in/ebook/tefcpi_sampling/fundamentals of survey sampling.pdf · categorized into two types: probability samples and non-probability

1: Fundamentals of Survey Sampling

2

The probability selection procedure selects the units from U with probability UiPi , . We

shall denote by Ni PPPP ,,,,, 21 P an N-component vector of the initial selection

probabilities of the units such that 11P . Generally P ~ g(n, N, X) ; e.g., Pi = 1/N i U ;

or Pi = n/N i = 1, 2, ... , k, k = N/n; or Pi = Xi /(X1 + X2 + . . . + XN), i U.

A nonempty set Uss : , obtained by using probability selection procedure P, is called an

unordered sample. The cardinality of s is n, which is also known as the (fixed) sample size.

However, there shall be occasions wherein we shall discuss about equal probability with

replacement and unequal probability with replacement sampling. In such cases the sample

size is not fixed. So barring these exceptions, we shall generally assume a fixed sample size n

throughout the book. A set of all possible samples is called sample space S. While using

probability selection procedures, the sample s may be drawn either with or without

replacement of units. In case of with replacement sampling, the cardinality of S is η = Nn, and

the probability is very strong that the sample selected may contain a unit more than once. In

without replacement sampling the cardinality of S is nNC υ , and the probability that the

sample selected may contain a unit more than once is zero. Throughout, it is assumed that the

probability sampling is without replacement of units unless specified otherwise.

Given a probability selection procedure P which describes the probability of selection of

units one by one, we define the probability of selection of a sample s as siPgsp i :)()( ,

Ss . We also denote by )(,),(,),2(),1( υpsppp p a υ-component vector of

selection probabilities of the samples. Obviously, p(s) ≥ 0 and 11p . It is well known that

given a unit by unit selection procedure, there exist a unique mass selection procedure; the

converse is also true.

After the sample is selected, data are collected from the sampled units. Let iy be the value of

study variable on the ith

unit selected in the sample s, si and Ss . We shall denote by

ni yyyy ,,,,, 21 y an n-component vector of the sampled observations. It is assumed

here that the observation vector y is measured without error and its elements are the true

values of the sampled units.

The problem in sample surveys is to estimate some unknown population parameter

)(Yfθ or ),(1 XYfθ . We shall focus on the estimation of population total,

Ui

iYθ 1Y , or population mean

Ui

iN YNNY 111Y An estimator e for a

given sample s is a function such that its value depends on iy , si . In general

),( Xyhe and the functional form h(. , .) would also depend upon the functional form of θ,

besides being a function of the sampling design. We can also write e = h{y, p(s)}.

A sampling design is defined as

Ssspsd :)(, . (1)

Further .1)( Ss

sp (2)

Page 3: FUNDAMENTALS OF SURVEY SAMPLING - iasri.res.iniasri.res.in/ebook/tefcpi_sampling/fundamentals of survey sampling.pdf · categorized into two types: probability samples and non-probability

1: Fundamentals of Survey Sampling

3

We shall denote by D = {d} a class of sampling designs.

S is also called the support of the sampling plan and nNC υ is called the support size. A

sampling plan is said to be a fixed-size sampling plan if whenever 0)( sp , the

corresponding subsets of units are composed of the same number of units. We shall restrict

the discussion to fixed size sample only and sample size would always mean fixed size

sample.

The triplet (S, p, e) is called the sampling strategy.

A familiarity with the expectation and variance operators is assumed in the sequel. An

estimator is said to be unbiased for estimation of population parameter if )(eEd with

respect to a sampling design d, where dE denotes the expectation operator. The bias of an

estimator e for estimating , with respect to a sampling design d, is )()( eEeB dd .

Variance of an unbiased estimator e for , with respect to sampling design d, is 222 )()()( eEeEeV ddd . The mean square error of a biased estimator e for is

given by 222 )}({)(})()({)()( eBeVeEeEeEeEeMSE ddddd .

Implementation of Sampling Plans

Consider again a finite population of N distinct and identifiable units. The problem is to

estimate some population parameter θ using a sample of size n drawn without replacement of

units using some pre-defined sampling plan p = {p(s);s S}. There are υ samples in the

sample space S. The sampling distribution is given in the Table below:

S. No. Sample (s) Probability of selection

p(s)

Cumulative Sum of

p(s)

1 s1 p(s1) p(s1)

2 s2 p(s2) p(s1) + p(s2) 3 s3 p(s3) p(s1) + p(s2) + p(s3)

k – 1 sk–1 p(sk–1) p(s1) + p(s2) + p(s3) +. . . + p(sk–1)

k sk p(sk) p(s1) + p(s2) + p(s3) +. . . + p(sk)

k + 1 sk+1 p(sk+1) p(s1) + p(s2) + p(s3) +. . . + p(sk+1)

b sb p(sb) p(s1) + p(s2) + p(s3) +. . . + p(sb) = 1

Now, draw a random number between 0 and 1 (i.e. a uniform variate). Let the random

number drawn be R. Then the kth

sample would be selected if p(s1)+ p(s2)+ p(s3) + … + p(sk–

1) < R ≤ p(s1)+ p(s2)+ p(s3) + … + p(sk).

Selection probability and Inclusion probability

Suppose a sample of fixed size n is selected without replacement of units from a population

containing N distinct and identifiable units by drawing units one by one in a total of n draws.

Page 4: FUNDAMENTALS OF SURVEY SAMPLING - iasri.res.iniasri.res.in/ebook/tefcpi_sampling/fundamentals of survey sampling.pdf · categorized into two types: probability samples and non-probability

1: Fundamentals of Survey Sampling

4

Let)(r

iP denote the probability of selecting the ith

unit at the rth

draw,

nrNi ,,2,1;,,2,1 . )(r

iP satisfies the following conditions.

i) 1

1

)(

N

i

riP and ii) 10

)(

riP for any nr 1 .

The first-order inclusion probability for unit i is the probability that unit i is included in a

sample of size n and is given by

is

i spπ )( . (3)

The second-order inclusion probability for units i and j is defined as the probability that the

two units i and j are included in a sample of size n and is given by

jis

ij sp,

)( (4)

For any sampling design d, the inclusion probabilities satisfy following conditions.

i)

N

ii nπ

1

;

ii)

N

jiij Ni πnπ

1

,1,2,,)1( ;

iii)

N

i

N

jij

ij nnπ1 1

)1( .

Horvitz-Thompson estimator

For a given population of N units, define

Population Total,

N

i

iYY1

, Population Variance,

N

YY

σ

N

i

i

1

2

2 ,

and Population Mean,

N

iiYNY

1

1 , (5)

For an observed sample, s, obtained using a sampling design SSsps );(, , the Horvitz-

Thompson estimator for the population total is defined as

Page 5: FUNDAMENTALS OF SURVEY SAMPLING - iasri.res.iniasri.res.in/ebook/tefcpi_sampling/fundamentals of survey sampling.pdf · categorized into two types: probability samples and non-probability

1: Fundamentals of Survey Sampling

5

si i

iHT

yY

ˆ . (6)

Provided that all first-order inclusion probabilities of units in the population are different

from zero, (6) is an unbiased estimator of the population total.

The Sen-Yates-Grundy (1953) form of variance of the Horvitz-Thompson estimator of the

population total is

2

1 1

)(ˆ

j

j

i

iN

i

N

ji

ijjiHTπ

Y

π

YπππYV . (7)

Provided that the second-order inclusion probabilities for all pairs of units in the population

are non-zero, an unbiased estimator of variance is given by

2

ˆˆ

j

j

i

i

si sji ij

ijji

HTπ

y

π

y

π

πππYV . (8)

Clearly, a sufficient condition for non-negativity of variance estimator (8) is that ijji πππ ,

ji .

Equal and unequal probability sampling

Probability sampling procedures can broadly be classified into three broad categories, viz., (i)

simple random sampling without replacement, (ii) systematic sampling and (iii) unequal

probability sampling.

Simple random sampling

Simple random sampling (SRS) is the simplest method of selecting a sample of n units out of

N units by drawing units one by one with or without replacement.

Simple random sampling with replacement

Simple random sampling with replacement (SRSWR) is a method of drawing n units from N

units in the population such that at every draw, each unit in the population has the same

chance of being selected and draws are made with replacement of selected units. For SRSWR

scheme, NN

Pi ,1,2,i 1

. Further, *)1()(;,21;

1Ssn,,r Ui

NPP i

ri The

sample space S* under SRSWR consists of nN possible samples. Probability of selecting a

sample of size n under SRSWR is

nNNNNsp

11...

1.

1)( .

Page 6: FUNDAMENTALS OF SURVEY SAMPLING - iasri.res.iniasri.res.in/ebook/tefcpi_sampling/fundamentals of survey sampling.pdf · categorized into two types: probability samples and non-probability

1: Fundamentals of Survey Sampling

6

The first and second order inclusion probabilities under SRSWR are given by, n

iN

111 , Ni ,...,2,1 and

,2

11

121

nn

ijNN

Nji ,...,2,1 .

An unbiased estimator of population total Y under SRSWR is given by

n

i

iyn

NY

1

ˆ . (9)

The variance of the estimator (9) under SRSWR is

n

σNYV

22)ˆ( (10)

where 2σ is as defined in (5).

An unbiased estimator of (1.6.2) under SRSWR is given by

n

sNYV

22)ˆ(ˆ (11)

where

n

i

i yyn

s1

22

1

1, and

n

i

iyn

y1

1.

Simple random sampling without replacement

SRSWR has the drawback that one or more sampling unit may occur more than once in a

sample. Repeated units provide no extra information for estimation of parameters. Simple

random sampling without replacement (SRSWOR) is free from such loss of information and

is thus preferred over SRSWR. SRSWOR is a method of drawing n units from N units in the

population such that at every draw, each unit in the population has the same chance of being

selected and draws are made without replacement of selected units. The sample space S under

SRSWOR consists of nN C possible samples.

The selection probability of a unit at any given draw in case of SRSWOR isN

1, i.e.,

.S s;n,,,r U;i N

PP *i

ri 21

1)1()(

Page 7: FUNDAMENTALS OF SURVEY SAMPLING - iasri.res.iniasri.res.in/ebook/tefcpi_sampling/fundamentals of survey sampling.pdf · categorized into two types: probability samples and non-probability

1: Fundamentals of Survey Sampling

7

Probability of selecting a sample of size n under SRSWOR is

n

Nsp

1)( .

Under SRSWOR, N

nπ i , Ni ,...,2,1 and ,

)1(

)1(

NN

nnπ ij Nji ,...,2,1 .

An unbiased estimator of population total Y under SRSWOR is

n

i

iyn

NY

1

ˆ . (12)

The variance of the estimator (12) under SRSWOR is given by

Nn

SnNN

nN

nNNYV

2222 )(

)1(

)()ˆ(

, (13)

where 2σ is as defined in (5) and 22 )1( SNN .

An unbiased estimator of (13) under SRSWOR is given by

n

snNNYV

2)()ˆ(ˆ , (14)

where

n

i

i yysn

1

22)1( and

n

i

iyn

y1

1.

It may be seen that Horvitz-Thompson estimator of population total Y under SRSWOR is

identical with the estimator in (12).

The variance of Horvitz-Thompson estimator of population total Y under SRSWOR is

22

)1(ˆ σ

nN

NnNYV HT

, (15)

which is same as (13).

An unbiased estimator of Sen-Yates-Grundy form of variance of Horvitz-Thompson

estimator of population total Y under SRSWOR is

2

2

)(ˆˆ sn

nNN

π

y

π

y

π

πππYV

j

j

i

i

si sji ij

ijji

HT

, (16)

which is same as (14).

Systematic sampling

Page 8: FUNDAMENTALS OF SURVEY SAMPLING - iasri.res.iniasri.res.in/ebook/tefcpi_sampling/fundamentals of survey sampling.pdf · categorized into two types: probability samples and non-probability

1: Fundamentals of Survey Sampling

8

Systematic sampling is an operationally very convenient sampling technique in which only

the first unit is selected randomly and the rest of the units get selected automatically

according to some pre-specified pattern. If N units of the population are labeled as 1, 2, …, N

in some order. Let N = nt, where n is sample size and t is a positive integer. Then under

systematic sampling, a random number u less than or equal to t is selected. The sample then

consists of the unit u and every tth unit thereafter and thus the sample is {u, u + t, …, u + (n –

1)t}. It may be seen there are t systematic samples and each sample has the probability of

selection 1/t. Also, iN

n

tπi

1 and tnititij

N

n

tij )1(,...,2,

1 and = 0 for

other j’s.

For ntN , Lahiri (1954) suggested circular systematic sampling in which t is taken as the

integer nearest to n

N. A random number u is chosen from 1 to N and the uth unit is selected

and thereafter every tth unit is chosen in a circular manner till a sample of n units is selected.

When N = nt an unbiased estimator of population total is given by

n

j

ijsys yn

NY

1

ˆ (17)

where ijy denotes the observation on the jth ),...,2,1( nj unit in the ith

( ),...,2,1 ti systematic sample.

However, the estimator (17) is not unbiased when ntN . In general, an unbiased estimator

of population total Y is

n

j

ijsys yn

NY

1

*ˆ (18)

where n is the size of the selected sample.

The variance of the estimator (17) is given by

2

1

ˆ1ˆ

t

i

syssys YYt

YV . (19)

Unbiased estimation of variance at (19) is not possible. To see this,

note

N

uu

uu

N

u

isyssyssys YYYYEYYEYV11

222

2ˆˆˆ . Since all possible pairs uu , do

not appear in the sample space, third term cannot be estimated unbiasedly. However, an

approximate variance estimator due to Cochran (1946) is

2

2

1

2

1

ˆ

)1(

)(ˆN

Yny

nn

nNNYV

sysn

j

sys ij (20)

Page 9: FUNDAMENTALS OF SURVEY SAMPLING - iasri.res.iniasri.res.in/ebook/tefcpi_sampling/fundamentals of survey sampling.pdf · categorized into two types: probability samples and non-probability

1: Fundamentals of Survey Sampling

9

Another approximate estimate of variance due to Yates (1948) is given by

n

j

ijijsys yynn

nNNYV

1

2

12)1(2

)(ˆ . (21)

Both the estimators (20) and (21) are biased and hence should be used with caution.

Unequal probability sampling

In sampling from finite population, often the value of some auxiliary variable, closely related

to the main characteristic of interest, is available for all the units of the population. This

normalized value of the auxiliary variable may be taken as a measure of the size of a unit. For

example, in agricultural surveys, area under crop may be taken as a size measure of farms for

estimating the yield of crops. In such situations sampling the units with probability

proportional to size measure with replacement or without replacement may be used in place

of SRSWR or SRSWOR. Since the units with larger sizes are expected to have bigger total of

Y, it is expected that probability proportional to size measure sampling procedures will be

more efficient than SRSWOR or SRSWR. Note that sampling units have unequal probability

of selection. In the sequel we discuss some common unequal probability sampling procedures

which we shall be referring to in later Chapters.

Probability proportional to size with replacement

In probability proportional to size with replacement (PPSWR) sampling, the units are

selected with probabilities proportional to some measure of their size and with replacement of

units. Let

N

i

i

i

i

X

XP

1

denote the normalized size measure of ith Ni ,...,2,1 unit, where

iX denotes the size measure of the ith

unit. In PPSWR, ith

unit is selected with probability iP .

Note that 11

N

i

iP . The first and second order inclusion probabilities are given by

NiP n

ii ,...,2,1;)1(1 and NjiPPPP n

ji

n

j

n

iij ,...,2,1;)1()1()1(1 .

Selection of a PPSWR sample may be done in two ways namely cumulative total method and

Lahiri’s method. In cumulative total method, cumulative total of sizes of the units is made.

Let

i

j

ji XT1

. A random number between 1 and

N

j

jN XT1

is drawn. Let the number

drawn be r. If ii TrT 1 , then the unit i is selected. This method is repeated n times for

selecting a sample of size n. Lahiri’s method is used when number of units is very large. In

this method, draw two random numbers i and r; first random number i between 1 and N and

second random number r between 1 and M, where UiXmaxM i );( . Select the unit i if

iXr . Otherwise reject the pair of random numbers i and r and draw a fresh pair of random

numbers. Repeat the trial till a unit is selected. This whole process is repeated n times to

select a sample of size n.

An unbiased estimator of population total Y under PPSWR sampling is

Page 10: FUNDAMENTALS OF SURVEY SAMPLING - iasri.res.iniasri.res.in/ebook/tefcpi_sampling/fundamentals of survey sampling.pdf · categorized into two types: probability samples and non-probability

1: Fundamentals of Survey Sampling

10

n

i i

i

P

y

nY

1

1ˆ . (22)

Under PPSWR sampling, variance of the estimator (22) of population total Y is given by

N

i

i

i

i PYp

YP

nYV

1

2

1ˆ . (23)

An unbiased estimator of (23) is

n

i i

i YP

y

nnYV

1

2

ˆ)1(

1ˆˆ . (24)

Unequal probability sampling without replacement

Most of the unequal probability sampling without replacement designs/schemes available in

the literature are based on Horvitz-Thompson estimator. Some desirable properties of a

sampling design based on Horvitz-Thompson estimator, listed by Hanurav (1967), are

i) Ui nPii

ii) Uji Ui iji ,0;0

iii) ,...,N,jijiij 21 ,

iv) ,ji

ijij

is not too close to zero.

Sampling plans satisfying property (i) are called Inclusion Probability Proportional to Size

(IPPS) sampling plans. If the choice of the auxiliary variable is such that the study variable is

proportional to the auxiliary variable, then the variance of the Horvitz-Thompson estimator of

the population total will be zero. However, a near proportionality will ensure that the variance

of the Horvitz-Thompson estimator of the population total is very small. In that sense the

IPPS schemes (or )PS sampling schemes have been studied extensively in the literature.

Property (ii) is required for existence of an unbiased estimator of the population total and an

unbiased variance estimator of the population total. Property (iii) ensures non-negativity of

the variance estimator while property (iv) is for stability of the variance estimator.

In the sequel we give a description of some of the IPPS sampling schemes.

Inclusion probability proportional to size sampling designs

Inclusion probability proportional to size (IPPS) sampling is a sampling procedure in which

units are selected without replacement and for which i , the probability of including the ith

unit in a sample of size n is nPi, where Pi is the initial probability of selecting ith

unit in the

population. An estimator which is commonly used to estimate the population total with IPPS

Page 11: FUNDAMENTALS OF SURVEY SAMPLING - iasri.res.iniasri.res.in/ebook/tefcpi_sampling/fundamentals of survey sampling.pdf · categorized into two types: probability samples and non-probability

1: Fundamentals of Survey Sampling

11

sampling procedures is the well known Horvitz-Thompson (1952) estimator. Narain (1951)

showed that a necessary condition for Horvitz-Thompson estimator of population total under

IPPS to be better than estimator (22) under PPSWR is that Uji n

n

ππ

π

ji

ijij

)1(2.

A sufficient condition for Horvitz-Thompson estimator of population total under IPPS to be

better than estimator (22) under PPSWR is that U ji n

n

ππ

π

ji

ijij

)1(.

There are a number of IPPS sampling designs available in literature. Initially the focus was

on obtaining IPPS sampling schemes for sample size n = 2. For a general sample size,

2n not many IPPS schemes are available in the literature. Some important IPPS schemes

for n = 2 are modified Midzuno-Sen (1952) strategy, Yates-Grundy (1953) sampling design,

Brewer’s (1963) sampling design, Durbin’s (1967) ungrouped procedure and Rao’s (1965)

rejective procedure. IPPS sampling plans for 2n are Fellegi’s (1963) scheme, Sampford’s

(1967) scheme, Tille’s method, Midzuno-Sen (1952), Rao-Hartley-Cochran (1962), and

Gupta et al (1982, 1984).

We now describe briefly some important sampling plans for sample size n ≥ 2.

Sampford’s (1967) IPPS scheme

The most important sampling scheme available in the literature for obtaining an IPPS sample

of size n ≥ 2 is due to Sampford (1967). This sampling scheme provides a non-negative

variance estimator, is better than PPS WR scheme and provides stable variance estimator. We

describe the scheme below:

Let i

i

iP

1, mjsiiiimS jm ,...,2,1 distinct, all are ':,...,,)( 21 ,

NmγγγLmii

mS

im 1;...21

)(

and 10 L .

A sample ),...,,( 21 niiis is selected with probability

n

u

iiiin

n

uv

i

n

u

in unvuPγγγnKγPKsp

111

1...)(21

(25)

where

1

1

n

tt

tn

nn

tLK .

The above probability of selection can be realized by drawing units one by one.

The above sampling plan ensures ii nPπ for Ni ,...,2,1 . For proofs, a reference may be

made to Sampford (1967).

Page 12: FUNDAMENTALS OF SURVEY SAMPLING - iasri.res.iniasri.res.in/ebook/tefcpi_sampling/fundamentals of survey sampling.pdf · categorized into two types: probability samples and non-probability

1: Fundamentals of Survey Sampling

12

Sampford’s IPPS sampling plan ensures Nji ij ...,,2,10 and

Njijiij ...,,2,1 (sufficient condition for non-negativity of variance estimator).

Midzuno-Sen (1952) sampling strategy

Midzuno-Sen (1952) sampling strategy is one of the simplest sampling designs. Under this

sampling design, the first unit is chosen with probability iP and the remaining (n – 1) units

are selected from the remaining (N–1) population units by SRSWOR.

Under Midzuno-Sen (1952) sampling,

1

1

1

N

nP

N

nNπ ii Ui

21

21

21

1

NN

nnPP

NN

nNnjiij , Uji . (26)

It is easy to see that if the first unit is selected with revised probability of selection

nN

nnP

nN

NP ii

11* and the remaining (n – 1) units are selected by SRSWOR from the

remaining (N–1) population units then the plan reduces to an IPPS sampling plan. Since,

probability of selection of every population unit should be different from zero, to achieve an

IPPS sampling plan with such revised probabilities requires the condition n

PNn

ni

1

)1(

1

to be satisfied.

Now, we describe another very important strategy due to Rao-Hartley-Cochran (1962).

Rao-Hartley-Cochran (RHC) sampling strategy

Rao, Hartley and Cochran (1962) proposed a very simple sampling strategy. First the

population is randomly divided into n groups nGGG ,...,, 21 of sizes nNNN ,...,, 21 units such

that nNNNN ...21 . Let tgP denote the initial selection probability of unit t from the

gth group, ngNt g ,,2,1and,,2,1 . Further, g

N

t

tg

g

P 1

and 1

1

n

g

g . From the

gth group, tth unit is selected with probabilityg

tgP

, ng ,,2,1 .

An unbiased estimator of population total Y under RHC sampling is

n

g tg

tgRHC g

P

yY

1

ˆ . (27)

The variance of the estimator (27) of population total Y under RHC sampling is

Page 13: FUNDAMENTALS OF SURVEY SAMPLING - iasri.res.iniasri.res.in/ebook/tefcpi_sampling/fundamentals of survey sampling.pdf · categorized into two types: probability samples and non-probability

1: Fundamentals of Survey Sampling

13

N

t t

t

n

g

g

RHCn

Y

nP

Y

NN

NNn

YV1

221

2

)1(ˆ . (28)

In case of equal group sizes i.e., n

NNNN n ...21 ,

N

t t

t

RHCn

Y

nP

Y

N

nYV

1

22

1

11ˆ .

An unbiased estimator of (28) is given by 2

1

1

22

1

2

ˆˆˆ

n

g

RHC

tg

tg

gn

g

g

n

g

g

RHC YP

y

NN

NN

YV .

REFERENCES

Brewer, K.R.W. (1963). A model of systematic sampling with unequal probabilities.

Australian Journal of Statistics 5, 5-13.

Cochran, W.G. (1946). Relative accuracy of systematic and stratified random samples for a

certain class of populations. Annals of Mathematical Statistics 17, 164-177.

Cochran, W.G. (1984). Sampling techniques (3rd

Edition). John Wiley and Sons.

Durbin, J. (1967). Estimation of sampling error in multi-stage survey. Applied Statistics 16,

152-164.

Fellegi, I.P. (1963). Sampling with varying probabilities without replacement: Rotating and

non-rotating samples. Journal of American Statistical Association 58, 183-201.

Gupta, V.K., Nigam, A.K. and Kumar, Pranesh (1982).On a family of sampling schemes with

inclusion probability proportional to size. Biometrika 69, 191-196.

Gupta, V.K. (1984). Use of combinatorial properties of block designs in unequal probability

sampling. Unpublished Ph.D. thesis, IARI, New Delhi.

Hanurav, T.V. (1967). Optimum utilization of auxiliary information: πps sampling of two

units from a stratum. Journal of Royal Statistical Society B31, 192-194.

Hartley, H.O. and Rao, J. N. K. (1962). Sampling with unequal probabilities and without

replacement. Annal of Mathematical Statistics 33, 350-374.

Horvitz, D.G. and Thompson, D.J. (1952). A generalization of sampling without replacement

from a finite universe. Journal of American Statistical Association 47, 663-685.

Lahiri, D.B. (1951). A method for sample selection providing unbiased ratio estimates.

Bulletin of International Statistical Institute 33(2), 133-140.

Lahiri, D.B. (1954). On the question of bias of systematic sampling. Proceedings of world

population conference 6, 349-362.

Midzuno, H. (1952). On the sampling system with probability proportional to sum of sizes.

Annals of Institute of Statistics and Mathematics 3, 99-107.

Page 14: FUNDAMENTALS OF SURVEY SAMPLING - iasri.res.iniasri.res.in/ebook/tefcpi_sampling/fundamentals of survey sampling.pdf · categorized into two types: probability samples and non-probability

1: Fundamentals of Survey Sampling

14

Mukhopadhyay, P. (1998). Theory and methods of survey sampling. Prentice Hall of India

Private Limited, New Delhi.

Murthy, M.N. (1977). Sampling: theory and methods. Statistical Publishing Society, Calcutta.

Murthy, M.N. and Rao, T.J. (1988). Systematic sampling with illustrative examples. In

Handbook of Statistics 6, P.R. Krishnaiah and C.R. Rao edition, Elsevier, Amsterdam,

147-185.

Narain, R.D. (1951). On sampling with varying probabilities without replacement. Journal of

Indian Society of Agricultural Statistics 3, 169-175.

Neyman, J. (1934). On the two different aspects of the representative method. Journal of the

Royal Statistical Society 97, 558-606.

Raj, Des and Chandhok, P. (1998). Sample survey theory. Narosa Publishing House.

Rao, J.N.K. (1965). On two simple procedures of unequal probability sampling without

replacement. Journal of Indian Society of Agricultural Statistics 3, 173-180.

Rao, J.N.K., Hartley, H.O. and Cochran, W.G. (1962). A simple procedure of unequal

probability sampling without replacement. Journal of the Royal Statistical Society B24,

482-491.

Sampford, M.R. (1967). On sampling without replacement with unequal probability of

selection. Biometrika 54, 499-513.

Sen, A.R. (1952). Present status of probability sampling and its use in estimation of farm

characteristics (in abstract). Econometrica 20, 130.

Sen, A.R. (1953). On the estimate of variance in sampling with varying probabilities. Journal

of the Indian Society of Agricultural Statistics 5, 119-127.

Sukhatme, P.V., Sukhatme, B.V., Sukhatme, Shashikala and Asok, C. (19Theory of sample

surveys with applications. ISAS, IASRI, New Delhi.

Yates, F. (1948). Systematic sampling. Philosophical Transactions of The Royal Society A20,

345-377.

Yates, F. and Grundy, P.M. (1953). Selection without replacement from within strata with

probability proportional to size. Journal of the Royal Statistical Society B15, 253-261.