59th wsc-isi 2013 paper gennari 11

Upload: pgennari

Post on 03-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 59th WSC-IsI 2013 Paper Gennari 11

    1/6

  • 7/28/2019 59th WSC-IsI 2013 Paper Gennari 11

    2/6

    2. Informative context

    Let Uj be the unknownj-th target population of interest at time t, beingj=1 for rural

    households, j=2 for farms and j=3 for land parcels. For a generic set B

    let )(# BNB denote the number of its element; in this way UjN indicates the size of

    the population Uj .

    Let kj denote the generic unit of population Uj , being Uj jNk ,...,1 , where k1 ,

    k2 and k3 indicate respectively a household, a farm and a land parcel. The unit

    kj may viewed as a cluster, kjU , of kjUN elemental units kij ( kjU

    Ni ,...,1 ).

    The type and the nature of the elemental units have to be thoroughly defined taking

    into account the specificities of the adopted sampling process (see section 3).For the sake of simplicity, we assume below that each variable of interest is related toonly one of the target populations: e.g.: employment status is linked to the units of

    U1 ; while maize production is related to those of U2 .

    Each complex unit has a single reporting unit which can provide the information forthe whole unit. In our context the reporting units are respectively: the head of thehousehold, the farm holder and the farm holder in which the land parcel is located.

    The variable of interest yj , related to the population Uj , may be collected from the

    reporting units of the same population; but, in some situations, it is possible to collect

    the information from the reporting units of an alternative population. For instance, the

    employment status, related to U1 , may be measured by asking the heads of

    households how many household members have worked in the reference week; oralternatively, the information can be collected by asking the farm holders how manypeople have worked in the farm during the reference week; obviously, the latter

    estimate may be affected by a larger measurement error as, for instance, some people

    could have worked in more than one farm. In this example, the target information canbe collected either from the elemental units of the household or of the farm.

    Let kj jy be the value of the variable of interest yj , (related to the population Uj )

    measured on the unit kj of the same population and let

    Uk Ui kijUk kjj jj kj

    jjj jyyY (1)

    be theparameter ofinterest.

    Let Aj be the sampling frame of the population Uj and let kj denote the generic

    unit of it,. The frames are usually built at the time 0t of the previous Census, with

    0t

  • 7/28/2019 59th WSC-IsI 2013 Paper Gennari 11

    3/6

    situations the MSF is built only on a sample basis. Let AS)3.2.1( indicate the MSF

    built on a sample basis, and letkj

    w be a weight, available in this frame, such

    as Ak Akp Sj jj NwE )3.2.1( )( , in which )(pE denotes the expectation overrepeated sampling.

    3. Sampling

    Alternative sampling schemes may be proposed for producing consistent and unbiasedinference on the parameters of interest. Each scheme may be viewed as an ordered

    chain of samples. The chain starts by selecting a sample from Aj (j=1, 2 or 3) and

    then by obtaining a sample Sj of the unknown population Uj . The sample Sj may

    be selected either by directorindirectsampling. The first sample can be drawn from

    the MSF A3,2,1 (or AS)3.2.1( ). For illustrative purposes, let us consider below the

    case in which the selection starts from the households frame A1 .

    A typical direct sampling design foresees the following steps: (1) selection of asample of EAs; and (2) complete enumeration of all existing households in the sample

    of EAs. This scheme is symbolized as: A1

    EAdir

    S1

    .

    With an indirectsampling approach, the sample design can be illustrated as follows:

    (1) selection of a direct sample of households S1

    , registered in A1 ; (2) longitudinal

    identification of the sample of new households S1 , to which each person of the

    original sample of households S1

    now belongs; and (3) complete enumeration of all

    household members of S1 , irrespective if they were included or not in the original

    sample of individuals. This sampling procedure can be denoted

    as: A1

    EAdir

    S1

    perso nindir

    S1

    . We can thus define the sample indicator variablesik

    t 1

    and

    the linking variables kikl 11 ,, being: 1

    1

    jkt if Ski 11 and 01 kit otherwise;

    111 ,

    kik

    l if Sik 11 belongs to the new household k1 (in the sample S1 ), and

    011 ,

    kik

    l otherwise. An example of this process is given in the figure 1.A, where the

    ovals represents the households and the circles the individuals; the original sample

    S1 is formed by two households who generate three new households. All six

    individuals of the three new households are surveyed. In figure 1.A the individuals

    with non-zero links are A, B, D and E. The relationship, S1

    perso nindir

    S1

    , identifies

    longitudinallinks since the units in S1

    are surveyed at time0t while those in S

    1

    are surveyed at time t.

    The following sample in the chain, S2

    , selecting the farms, is also drawn by indirect

    sampling. This segment of the sample chain, is represented as: S1

    perso nindir

    S2 S3 .

    may be summarized as follows. All farms employing household members of S1

    as

    workers (either as employees or farm operators) are surveyed. In this way an indirect

    sampling S2 (of unknown dimension )#(22 Sn ) of the current population of

    farms, U2 , is observed. We can thus define the linking variables kikl 21 ,, where

    121 ,

    kik

    l if the individual ik

    1 works in the farm k

    2 in S2 , and 0

    21 ,

    kikl

    otherwise. An example of this process is given in figure 1.B, where the original

    sample S1 of two households is linked with three farms and the individual A works

    in two farms.

  • 7/28/2019 59th WSC-IsI 2013 Paper Gennari 11

    4/6

    In the last sample of the chain, all land parcels of the farms in S2 are surveyed. In

    this way an indirect sampling S3 (of unknown dimension )#(33 Sn ) of the current

    population, U3 , of the land parcels is observed. In this case, the linking variables

    kikl

    31 , assume value 1 if the individual ik1 works in the farm k3 (of S3 ) in which

    the sample land parcel is located.

    Starting from an indirect sampling for the identification of S1

    , the whole chain of

    samples may be represented as follows: A1

    EAdir

    S1

    perso nindir

    S1

    perso nindir

    S2 S3 .

    The alternative chain, starting with a direct sampling, is: A1

    EA

    dirS

    1perso n

    indirS2 S3 .

    The quality of the overall process is strictly related to a good identification of existing

    links (j=1,2,3). Starting the sample chain with the MSF A3.2,1 (or AS3.2,1 ) greatly

    improves the quality of the sample design, since the enumerator can be supported by

    the knowledge of the previous links recorded in the data base.Alternatively, the sample design can start from the land parcels or the farms. In thecase of the land parcels, a possible sample chain is: (1) selection of a sample of EAs;

    (2) complete enumeration of all existing land parcels in the sample of EAs, thus

    obtaining the sample S3 ; (3) creation of an indirect sample of farms, S2 , by

    considering only the farms which own the selected sample of land parcels; and then(3) selection of a sample of households whose members are employed in the sample of

    farms S2 . This chain may be represented as A3 EA

    dirS3

    landprop

    indir

    . S2 kerwor

    indir S1 .

    Note that the indirect sampling scheme S3 landprop

    indir

    .S2 is that proposed in the Area

    Frame sampling for the Closed Segment Estimator(Faulkenberry and Garoui, 1991).

    An alternative chain may be defined if all the farms which have their headquarterlocated in the area of the selected sample of land parcels S3 are surveyed. In this

    case, the indirect sampling scheme S3 rheadquartefarm

    indir S2 corresponds to the classical

    design proposed for the Open Segment Estimator. The latter chain may be symbolized

    as A3 EA

    dirS3

    rheadquartefarm

    indir S2 kerwor

    indir S1 .

    In case of a process starting from the farms, the longitudinal links between the old andthe new farm may be identified by the workers, as in the following chain:

    A2 EA

    dirS2

    kerwor

    indir S2 kerwor

    indir S1 and S2 S3 .

    Conceptually it is possible to start the selection from either the households, the farmsor the land parcels. Each choice has pros and cons that must be thoroughly evaluated

    taking into account the type of information available at country level. The proposedapproach is rather flexible and may be adapted to the specific country context.

  • 7/28/2019 59th WSC-IsI 2013 Paper Gennari 11

    5/6

    Moreover, it represents a general methodological solution which covers, as particularcases, different methods proposed in the literature for dealing with imperfect framesand rare populations (e.g., snowball sampling and adaptive sampling - Chaudhuri,2010). Furthermore, as shown in Lavalle and Rivest (2012), indirect sampling couldrepresent a generalization of the traditional Petersen estimator, used in the context ofcoverage errors and multiple frames.

    4. Estimation

    Unbiased estimate of the totals Yj , may be obtained using the Direct Generalized

    Weight Share Method (DGWSM) estimator:

    Sk kjkj jj jj ywY . (2)

    where the weight kjw is defined taking into account the specific sample chain:

    jjd

    jjL

    lt

    wikki

    Aik

    kik

    kikik

    k

    jj

    jj

    jj

    jjj

    j

    *for/1

    *for1

    ,

    *

    **

    *

    **

    , (3)

    In (3), j* denotes the type of sampling units from which the selection starts, being

    kiik jj ** if the initial step of the sample chain is obtained through direct sampling

    (e.g. Aj* EA

    dirSj* , with Sik jj **

    ) or ikik jj

    ** if the initial chain adopts an

    indirect sampling approach (e.g. Aj* EA

    dirSj

    * element

    indir Sj* , with Sik jj

    ** );

    ikj

    *

    denotes the inclusion probability and Aik kikk jj jjj lL ** * , .

    An alternative formulation of the estimate Yj

    as function of the units selected in the

    initial sample Sj

    * - being Sj

    * Sj* (with an initial direct sampling) or

    Sj

    * Sj

    * (with an initial indirect sampling) - is:

    Sik kkj jj jj zdY ** ** (4)

    where

    jjy

    jjLyLz

    kj

    Uk kkjkk

    kj

    jjjjjj

    j *for

    *for)/(

    ,

    *

    *

    *

    (5)

    being

    Uk ki kikkk jj j jjjj lL

    ,

    ,

    * ** .

    5. Consistency

    The availability of a MFS, where the links among the target units are recorded, allowsthe use of a gregestimator for calculating of the target totals:

    Sik kkcalgr egj jj jj zdY ** ** (6)

    in which,

    kjkSk kjkjjjkkcal jjjj jjjjdd

    1

    *** ****

    ~])/~~()~

    (1[ xxxXX

    .

    where Ak kjj jj j * xX , Sk kkjj jj jj ** * /~

    ~ xX ,

  • 7/28/2019 59th WSC-IsI 2013 Paper Gennari 11

    6/6

    jj

    jjLL

    kj

    Ak kkjkkkj

    j

    jj jjjj

    j *for

    *for)/(~

    *

    *

    *

    ,

    x

    x

    x

    The greg estimator is calibratedas it ensures that for each of the target populations

    (households, farms and land parcels) the estimated totals of the auxiliary variables X~

    j

    are benchmarked to the total Xj known from the frame.

    It is worthwhile to highlight that this important result, ensuring the consistency of thestatistics related to the different target population, is made possible by the availabilityof the MSF. In case the MSF is built only a sample basis, the greg estimator ensures

    consistent estimates with respect to the known population sizes NAj , by incorporating

    the weightskj

    w , available in this frame. Finally, it should be noted that, in case of

    independently selected samples, the consistency for a target variable may be obtainedadopting a calibrated estimator similar to (6), in which one of the benchmark totals isrepresented by a convex combination of the different direct estimates of the total of

    interest (Knotterus and van Duin, 2006).

    6. Conclusions

    In this paper we have demonstrated that indirect sampling may represent a unified

    approach for ensuring the consistency of integrated agricultural statistics and fordealing with frame imperfections. The approach is general enough to include, asparticular cases, different methods proposed in literature for dealing with imperfectframes (e.g., snowball sampling and adaptive sampling). Also, traditional estimatorsfor Area Frame in agricultural statistics, as the Open Segment Estimator and theClosed Segment Estimator, may be viewed as a particular expression of the DGWSMestimator. The approach is also rather flexible as it can be tailored to the specific

    informative context of each country by using different sampling chain designs.This paper supports the recommendations of the GS on the importance of the MSF forintegrated agricultural statistics. In particular, the MSF greatly improves the quality ofthe sampling design, by establishing effective links between the different statisticalunits involved, i.e., land parcels, households and farms. Furthermore, the MSF allowsthe construction of greg estimators which, for each target population, ensure thecalculation of consistent estimates with respect to the known auxiliary variables.

    References

    Chaudhuri A. (2010), Estimation with inadequate frames, in Benedetti R. Bee. M.,Espa G., Piersimeoni F.,Agricultural Survey Methods,Wiley, N.Y.

    FAO (2011), The Global Strategy to Improve Agricultural and Rural Statistics.FAO (2012), Guidelines for Linking Population and Housing Censuses withAgricultural Censuses.Faulkenberry and Garoui A. (1991), Estimating a Population Total Using an AreaFrame,Journalof the American Statistical Association, Vol.86, No. 414.Knotterus P. and van Duin C. (2006), Variances in repeated Weighting with an

    Application to the Dutch Labour Force Survey. Journal of Official Statistics, 22, 3,pp.565-584.

    Lavalle P. and Rivest L.P. (2012) Capture-recapture sampling and indirect sampling,Journal of Official Statistics, 28, 1-27.Lavalle P. (2007),Indirect Sampling, Springer.Srndal, C.E., Swensson, B., Wretman, J. (1992), Model Assisted Survey Sampling,Springer-Verlag.