59th wsc-isi 2013 paper gennari 11
TRANSCRIPT
-
7/28/2019 59th WSC-IsI 2013 Paper Gennari 11
1/6
-
7/28/2019 59th WSC-IsI 2013 Paper Gennari 11
2/6
2. Informative context
Let Uj be the unknownj-th target population of interest at time t, beingj=1 for rural
households, j=2 for farms and j=3 for land parcels. For a generic set B
let )(# BNB denote the number of its element; in this way UjN indicates the size of
the population Uj .
Let kj denote the generic unit of population Uj , being Uj jNk ,...,1 , where k1 ,
k2 and k3 indicate respectively a household, a farm and a land parcel. The unit
kj may viewed as a cluster, kjU , of kjUN elemental units kij ( kjU
Ni ,...,1 ).
The type and the nature of the elemental units have to be thoroughly defined taking
into account the specificities of the adopted sampling process (see section 3).For the sake of simplicity, we assume below that each variable of interest is related toonly one of the target populations: e.g.: employment status is linked to the units of
U1 ; while maize production is related to those of U2 .
Each complex unit has a single reporting unit which can provide the information forthe whole unit. In our context the reporting units are respectively: the head of thehousehold, the farm holder and the farm holder in which the land parcel is located.
The variable of interest yj , related to the population Uj , may be collected from the
reporting units of the same population; but, in some situations, it is possible to collect
the information from the reporting units of an alternative population. For instance, the
employment status, related to U1 , may be measured by asking the heads of
households how many household members have worked in the reference week; oralternatively, the information can be collected by asking the farm holders how manypeople have worked in the farm during the reference week; obviously, the latter
estimate may be affected by a larger measurement error as, for instance, some people
could have worked in more than one farm. In this example, the target information canbe collected either from the elemental units of the household or of the farm.
Let kj jy be the value of the variable of interest yj , (related to the population Uj )
measured on the unit kj of the same population and let
Uk Ui kijUk kjj jj kj
jjj jyyY (1)
be theparameter ofinterest.
Let Aj be the sampling frame of the population Uj and let kj denote the generic
unit of it,. The frames are usually built at the time 0t of the previous Census, with
0t
-
7/28/2019 59th WSC-IsI 2013 Paper Gennari 11
3/6
situations the MSF is built only on a sample basis. Let AS)3.2.1( indicate the MSF
built on a sample basis, and letkj
w be a weight, available in this frame, such
as Ak Akp Sj jj NwE )3.2.1( )( , in which )(pE denotes the expectation overrepeated sampling.
3. Sampling
Alternative sampling schemes may be proposed for producing consistent and unbiasedinference on the parameters of interest. Each scheme may be viewed as an ordered
chain of samples. The chain starts by selecting a sample from Aj (j=1, 2 or 3) and
then by obtaining a sample Sj of the unknown population Uj . The sample Sj may
be selected either by directorindirectsampling. The first sample can be drawn from
the MSF A3,2,1 (or AS)3.2.1( ). For illustrative purposes, let us consider below the
case in which the selection starts from the households frame A1 .
A typical direct sampling design foresees the following steps: (1) selection of asample of EAs; and (2) complete enumeration of all existing households in the sample
of EAs. This scheme is symbolized as: A1
EAdir
S1
.
With an indirectsampling approach, the sample design can be illustrated as follows:
(1) selection of a direct sample of households S1
, registered in A1 ; (2) longitudinal
identification of the sample of new households S1 , to which each person of the
original sample of households S1
now belongs; and (3) complete enumeration of all
household members of S1 , irrespective if they were included or not in the original
sample of individuals. This sampling procedure can be denoted
as: A1
EAdir
S1
perso nindir
S1
. We can thus define the sample indicator variablesik
t 1
and
the linking variables kikl 11 ,, being: 1
1
jkt if Ski 11 and 01 kit otherwise;
111 ,
kik
l if Sik 11 belongs to the new household k1 (in the sample S1 ), and
011 ,
kik
l otherwise. An example of this process is given in the figure 1.A, where the
ovals represents the households and the circles the individuals; the original sample
S1 is formed by two households who generate three new households. All six
individuals of the three new households are surveyed. In figure 1.A the individuals
with non-zero links are A, B, D and E. The relationship, S1
perso nindir
S1
, identifies
longitudinallinks since the units in S1
are surveyed at time0t while those in S
1
are surveyed at time t.
The following sample in the chain, S2
, selecting the farms, is also drawn by indirect
sampling. This segment of the sample chain, is represented as: S1
perso nindir
S2 S3 .
may be summarized as follows. All farms employing household members of S1
as
workers (either as employees or farm operators) are surveyed. In this way an indirect
sampling S2 (of unknown dimension )#(22 Sn ) of the current population of
farms, U2 , is observed. We can thus define the linking variables kikl 21 ,, where
121 ,
kik
l if the individual ik
1 works in the farm k
2 in S2 , and 0
21 ,
kikl
otherwise. An example of this process is given in figure 1.B, where the original
sample S1 of two households is linked with three farms and the individual A works
in two farms.
-
7/28/2019 59th WSC-IsI 2013 Paper Gennari 11
4/6
In the last sample of the chain, all land parcels of the farms in S2 are surveyed. In
this way an indirect sampling S3 (of unknown dimension )#(33 Sn ) of the current
population, U3 , of the land parcels is observed. In this case, the linking variables
kikl
31 , assume value 1 if the individual ik1 works in the farm k3 (of S3 ) in which
the sample land parcel is located.
Starting from an indirect sampling for the identification of S1
, the whole chain of
samples may be represented as follows: A1
EAdir
S1
perso nindir
S1
perso nindir
S2 S3 .
The alternative chain, starting with a direct sampling, is: A1
EA
dirS
1perso n
indirS2 S3 .
The quality of the overall process is strictly related to a good identification of existing
links (j=1,2,3). Starting the sample chain with the MSF A3.2,1 (or AS3.2,1 ) greatly
improves the quality of the sample design, since the enumerator can be supported by
the knowledge of the previous links recorded in the data base.Alternatively, the sample design can start from the land parcels or the farms. In thecase of the land parcels, a possible sample chain is: (1) selection of a sample of EAs;
(2) complete enumeration of all existing land parcels in the sample of EAs, thus
obtaining the sample S3 ; (3) creation of an indirect sample of farms, S2 , by
considering only the farms which own the selected sample of land parcels; and then(3) selection of a sample of households whose members are employed in the sample of
farms S2 . This chain may be represented as A3 EA
dirS3
landprop
indir
. S2 kerwor
indir S1 .
Note that the indirect sampling scheme S3 landprop
indir
.S2 is that proposed in the Area
Frame sampling for the Closed Segment Estimator(Faulkenberry and Garoui, 1991).
An alternative chain may be defined if all the farms which have their headquarterlocated in the area of the selected sample of land parcels S3 are surveyed. In this
case, the indirect sampling scheme S3 rheadquartefarm
indir S2 corresponds to the classical
design proposed for the Open Segment Estimator. The latter chain may be symbolized
as A3 EA
dirS3
rheadquartefarm
indir S2 kerwor
indir S1 .
In case of a process starting from the farms, the longitudinal links between the old andthe new farm may be identified by the workers, as in the following chain:
A2 EA
dirS2
kerwor
indir S2 kerwor
indir S1 and S2 S3 .
Conceptually it is possible to start the selection from either the households, the farmsor the land parcels. Each choice has pros and cons that must be thoroughly evaluated
taking into account the type of information available at country level. The proposedapproach is rather flexible and may be adapted to the specific country context.
-
7/28/2019 59th WSC-IsI 2013 Paper Gennari 11
5/6
Moreover, it represents a general methodological solution which covers, as particularcases, different methods proposed in the literature for dealing with imperfect framesand rare populations (e.g., snowball sampling and adaptive sampling - Chaudhuri,2010). Furthermore, as shown in Lavalle and Rivest (2012), indirect sampling couldrepresent a generalization of the traditional Petersen estimator, used in the context ofcoverage errors and multiple frames.
4. Estimation
Unbiased estimate of the totals Yj , may be obtained using the Direct Generalized
Weight Share Method (DGWSM) estimator:
Sk kjkj jj jj ywY . (2)
where the weight kjw is defined taking into account the specific sample chain:
jjd
jjL
lt
wikki
Aik
kik
kikik
k
jj
jj
jj
jjj
j
*for/1
*for1
,
*
**
*
**
, (3)
In (3), j* denotes the type of sampling units from which the selection starts, being
kiik jj ** if the initial step of the sample chain is obtained through direct sampling
(e.g. Aj* EA
dirSj* , with Sik jj **
) or ikik jj
** if the initial chain adopts an
indirect sampling approach (e.g. Aj* EA
dirSj
* element
indir Sj* , with Sik jj
** );
ikj
*
denotes the inclusion probability and Aik kikk jj jjj lL ** * , .
An alternative formulation of the estimate Yj
as function of the units selected in the
initial sample Sj
* - being Sj
* Sj* (with an initial direct sampling) or
Sj
* Sj
* (with an initial indirect sampling) - is:
Sik kkj jj jj zdY ** ** (4)
where
jjy
jjLyLz
kj
Uk kkjkk
kj
jjjjjj
j *for
*for)/(
,
*
*
*
(5)
being
Uk ki kikkk jj j jjjj lL
,
,
* ** .
5. Consistency
The availability of a MFS, where the links among the target units are recorded, allowsthe use of a gregestimator for calculating of the target totals:
Sik kkcalgr egj jj jj zdY ** ** (6)
in which,
kjkSk kjkjjjkkcal jjjj jjjjdd
1
*** ****
~])/~~()~
(1[ xxxXX
.
where Ak kjj jj j * xX , Sk kkjj jj jj ** * /~
~ xX ,
-
7/28/2019 59th WSC-IsI 2013 Paper Gennari 11
6/6
jj
jjLL
kj
Ak kkjkkkj
j
jj jjjj
j *for
*for)/(~
*
*
*
,
x
x
x
The greg estimator is calibratedas it ensures that for each of the target populations
(households, farms and land parcels) the estimated totals of the auxiliary variables X~
j
are benchmarked to the total Xj known from the frame.
It is worthwhile to highlight that this important result, ensuring the consistency of thestatistics related to the different target population, is made possible by the availabilityof the MSF. In case the MSF is built only a sample basis, the greg estimator ensures
consistent estimates with respect to the known population sizes NAj , by incorporating
the weightskj
w , available in this frame. Finally, it should be noted that, in case of
independently selected samples, the consistency for a target variable may be obtainedadopting a calibrated estimator similar to (6), in which one of the benchmark totals isrepresented by a convex combination of the different direct estimates of the total of
interest (Knotterus and van Duin, 2006).
6. Conclusions
In this paper we have demonstrated that indirect sampling may represent a unified
approach for ensuring the consistency of integrated agricultural statistics and fordealing with frame imperfections. The approach is general enough to include, asparticular cases, different methods proposed in literature for dealing with imperfectframes (e.g., snowball sampling and adaptive sampling). Also, traditional estimatorsfor Area Frame in agricultural statistics, as the Open Segment Estimator and theClosed Segment Estimator, may be viewed as a particular expression of the DGWSMestimator. The approach is also rather flexible as it can be tailored to the specific
informative context of each country by using different sampling chain designs.This paper supports the recommendations of the GS on the importance of the MSF forintegrated agricultural statistics. In particular, the MSF greatly improves the quality ofthe sampling design, by establishing effective links between the different statisticalunits involved, i.e., land parcels, households and farms. Furthermore, the MSF allowsthe construction of greg estimators which, for each target population, ensure thecalculation of consistent estimates with respect to the known auxiliary variables.
References
Chaudhuri A. (2010), Estimation with inadequate frames, in Benedetti R. Bee. M.,Espa G., Piersimeoni F.,Agricultural Survey Methods,Wiley, N.Y.
FAO (2011), The Global Strategy to Improve Agricultural and Rural Statistics.FAO (2012), Guidelines for Linking Population and Housing Censuses withAgricultural Censuses.Faulkenberry and Garoui A. (1991), Estimating a Population Total Using an AreaFrame,Journalof the American Statistical Association, Vol.86, No. 414.Knotterus P. and van Duin C. (2006), Variances in repeated Weighting with an
Application to the Dutch Labour Force Survey. Journal of Official Statistics, 22, 3,pp.565-584.
Lavalle P. and Rivest L.P. (2012) Capture-recapture sampling and indirect sampling,Journal of Official Statistics, 28, 1-27.Lavalle P. (2007),Indirect Sampling, Springer.Srndal, C.E., Swensson, B., Wretman, J. (1992), Model Assisted Survey Sampling,Springer-Verlag.