measuring povert in india - www.princeton

Upload: prasanta-pradhan

Post on 06-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Measuring Povert in India - Www.princeton

    1/51

    Estimating Comparable Poverty Counts from Incomparable

    Surveys: Measuring Poverty in India

    Alessandro Tarozzi

    Princeton University

    First draft September 2001 - This version, May 2002

    Abstract

    We develop a procedure to estimate poverty counts in India from the 55th

    Round of the Na-tional Sample Survey (NSS), a large household survey run in 1999-2000. The evidence suggests

    that a change in the survey design caused the reports on household expenditure to change to

    an extent that it is impossible, without adjustments, to compare poverty estimates from this

    survey with those obtained from previous NSS Rounds. More generally, the paper addresses the

    problem of comparing the distribution of a variable across differently designed surveys, when the

    different design causes the respondents reports about the variable to be incomparable across

    the surveys. The proposed procedure requires only the existence of a set of auxiliary variables

    whose reports are not affected by the different survey design, and whose relation with the main

    variable of interest is stable across the surveys. The estimator, instead, does not require spe-

    cic functional form assumptions on the relation between the main variable of interest and the

    auxiliary variable. In the context of NSS data, we identify a set of variables whose reports are

    not systematically affected by the changes implemented in the survey design, and we provide

    evidence of the stability over time of the distribution of per capita total expenditure conditional

    on these variables. We describe an experiment to evaluate the performance of the estimator,

    showing that it provides satisfactory results, both in the estimation of poverty counts and in the

    estimation of the density of per capita expenditure. Finally, we use our estimator to calculate

    adjusted estimates for poverty in India using data from the 1999-2000 NSS Survey. The results

    show a sharp reduction in poverty in the nineties, even if in rural areas the reduction is not as

    large as that implied by the unadjusted gures.

    I am grateful to Angus Deaton and Bo Honor for comments, support, and conversations that led to this project.

    I would also like to thank Elie Tamer for his help, and for comments on an earlier draft, and Dwayne Benjamin,

    Debopam Bhattacharya, Phil Cross, Peter Lanjouw, Aprajit Mahajan, Robert McMillan, Jonathan Morduch, and

    Barbara Rossi for comments and useful conversations, as well as seminar participants at Duke, Georgetown, Princeton,

    Toronto, and the World Bank. All errors are mine.

  • 8/2/2019 Measuring Povert in India - Www.princeton

    2/51

    1 Introduction

    The assessment of poverty in India has been based for decades on simple poverty counts (or headcount

    poverty ratios), which estimate the proportion of individuals living in households whose monthly per capita

    total expenditure is below a given threshold, known as poverty line.1 The Government of India periodically

    publishes official lines, representing the estimated per capita monthly expenditure associated, on average,

    with the consumption of a given minimum caloric intake.2 Data on expenditure are typically obtained from

    the Indian National Sample Survey (NSS hereafter), a household survey which, almost every year, collects

    detailed information on the consumption of an exhaustive list of items. Even if consumption is not the main

    object of most NSS Rounds,3 approximately every ve years a larger (quinquennial) survey is run, and the

    data thereby recorded are used to study household consumption and to estimate poverty counts in every

    Indian state.

    Poverty estimates represent an extremely relevant issue in political and economic discussions in India,

    and they became even more frequently debated after a process of economic liberalization started in the

    early nineties.4 Those opposing the process used the increasing gap between the growth in GDP showed by

    National Accounts Statistics (NAS) and the stagnant levels of per capita expenditure revealed by NSS, to

    argue that the liberalization process was not bringing benets to poor people in India, especially in rural

    areas.5 Several pro-liberalization observers also started to question the ability of the NSS surveys to pick

    up correctly consumption data. In particular, the reference (or recall) period adopted in the surveys started

    to be criticized. Historically, the respondent was asked to recall household consumption during the 30 days

    before the interview for every item listed in the questionnaire. Such choice of recall period is not typical

    in expenditure surveys, where it is more common to observe shorter recall periods for items purchased

    frequently (like food), and longer periods for durables.6 The choice of the reference period was inuenced by

    an experimental study completed in West Bengal in the fties (Mahalanobis and Sen, 1954), which showed

    that a shorter recall period led to overestimate consumption of food. Even if the subsequent literature

    has not yet reached a consensus as to which recall period gives, on average, the most correct picture of

    1 The use of headcount ratios to evaluate poverty is often criticized, since they are not sensitive to changes in the distribution

    of income below the p overty line. For a review of the literature on poverty measures and their properties see Deaton (1997, Ch.

    3), or Ravallion (1993).2 For a historical and methodological overview of the construction of the lines see Government of India, Planning Commission

    (1993). Deaton and Tarozzi (2000) investigate the appropriateness of the price indexes routinely used to inate the poverty

    lines, and propose alternatives. For another alternative set of poverty estimates see also Datt (1999).3 Examples of other issues analyzed by NSS surveys are schooling, health, employment, small enterprises, and the Indian

    Public Distribution System.4 The liberalization process started after a Balance of Payment crisis in the summer of 1991. For an overview of the economic

    reforms and its effects see Sachs, Varshney, and Bajpai (1999), and references therein.5 Sen (2000) shows that the increasing gap between NSS and NAS estimates during the nineties is actually a statistical artifact,

    due to differences in the deators used in the two different databases, and to changes introduced in the NAS methodology.6 For a recent survey of the literature on the choice of the recall period in consumption surveys, see Deaton and Grosh (2000)

    and references therein.

    2

  • 8/2/2019 Measuring Povert in India - Www.princeton

    3/51

  • 8/2/2019 Measuring Povert in India - Www.princeton

    4/51

    estimated by the Indian Planning Commission showed an impressive reduction in poverty with respect to

    the early nineties: using the data collected with a 30-day reports for food, in the rural sector the counts

    dropped from 37.2 in 1993-94 to 27.1 six years later, while in urban areas the proportion dropped from 32.6

    percent to 23.6. In both cases this amounts to a reduction of one third in poverty rates, in less than a decade.

    Such results, though, puzzled many observers, since they also showed that the change in the surveydesign reduced the discrepancy in average PCE between the two schedules to an implausible extent: the

    gap between the two estimates dropped to less than 5 percent, both in the rural and the urban sector

    (Government of India, 2001). The most plausible explanation of the reduction of the gap is that asking to

    report consumption with both recall periods prompted the respondents to reconcile the two consumption

    levels they were reporting. This interpretation is made more plausible by the fact that the two different

    reports had to be recorded on two parallel columns printed next to each other on the questionnaire. But

    then we cannot use any of the two sets of reports to build poverty counts comparable with those obtained

    from previous Rounds. In fact, consumption of food reported with the traditional 30-day recall period is

    likely to be disproportionately high (since the respondent will tend to avoid large discrepancies with the 7-day

    reports, which are typically higher), and the corresponding reports based on a 7-day recall period are likely

    to be disproportionately low (by a symmetric argument). The comparability of these latest poverty counts

    is further hindered by the fact that consumption data on durables, clothing and footwear, and educational

    and institutional medical expenditure have been collected exclusively with a 365-day recall period.11

    The main objective of this paper is to propose a method to estimate PCE-based poverty counts from the

    55th NSS Round without using data on PCE from such survey. The procedure is based on the assumption

    that the 55th Round and previous NSS Rounds characterized by the standard survey design share the same

    distribution of PCE conditional on a set of other variables for which the changes in the questionnaire didnot cause systematic changes in the reports. If this assumption holds, it will be possible to use observations

    on these other variables from the 55th Round, together with information on the structure of the conditional

    distribution from previous Rounds, to recover the marginal distribution of PCE in the 55 th Round. Formally,

    the headcount ratio H55 for the population sampled in the 55th Round can be written as

    H55 P(y < z | 55) =

    ZP(y < z | v, 55)f(v | 55) dv (1)

    where z is the poverty line, y is reported PCE when the respondent is given the standard questionnaire, v is

    a vector of variables whose reports have not been affected by the change in survey design, P(y < z | v, 55)

    denotes the conditional probability of being poor given v, and f(v | 55) is the marginal density of v in

    the 55th Round. A sample equivalent of the above expression cannot be directly computed, since y is not

    observed. But if the sampling process identies f(v | 55), and if P(y < z | v, 55) = P(y < z | v, a) for some

    other auxiliary survey a in which both y and v are reported using a standard questionnaire, the poverty

    11 For other papers dealing with the consequences of differences in survey design on poverty estimation, see Gibson (1999a,

    1999b, 2001), and Lanjouw and Lanjouw (2001).

    4

  • 8/2/2019 Measuring Povert in India - Www.princeton

    5/51

    count becomes

    H55 =

    ZP(y < z | v, a)f(v | 55) dv (2)

    where both terms inside the integral are now identied.12 More generally, we show that one can use analogous

    arguments to estimate the marginal density of y in the 55th Round without using observations on PCE

    recorded in that survey.

    A natural choice for a variable to be included in v is the set of 30-day items listed above, for several

    reasons. First, the recall period for these items stayed the same in all NSS surveys, including those with

    experimental design, so that it is reasonable to expect that questionnaire changes did not affect consumption

    reports for these items. In this respect, it is suggestive that in the smaller experimental surveys run after

    the 50th Round, while the average PCE changes considerably across different schedules, mean per capita

    expenditure of 30-day items does not show any such difference across the two schedule types (see Table

    1). Second, expenditure on 30-day items is strongly correlated with total expenditure, so that knowing the

    rst will allow one to make inferences about the second.

    13

    Third, an important component of the relationbetween y and v will depend upon households preferences, so that such relation is likely to be rather stable

    over time. Note, though, that movements in relative prices or other factors might undermine the validity

    of this assumption, whose credibility should be scrutinized as carefully as possible. We will include other

    household-specic covariates in v, but our results show that per capita expenditure on 30-day items is by

    far the most important variable one has to consider when implementing the adjustment we propose.

    It should be stressed that the scope of our procedure goes beyond the calculation of poverty counts or

    densities of PCE in India. The same tools developed here might be used in any situation where the researcher

    needs to build comparable statistics using surveys which are only partly comparable, if the conditions outlined

    above hold. It is also important to note that such conditions cannot be formally tested. In our context, for

    example, they relate to reports as they would be observed if the questionnaire design had remained the same

    as in previous Rounds, which it has not. In this sense, they are assumptions necessary for identication.

    Indirect evidence of the validity of the assumptions can be obtained from the smaller NSS Rounds run

    between 1994 and 1998, which also offer an interesting opportunity to evaluate how well the procedure

    performs, which is of course an important element to consider if one wants to implement it using data

    from the 55th NSS Round. In each of these smaller surveys a subset of households received the standard

    questionnaire (S1) based on a 30-day recall p eriod for all items, while another subset received an experimental

    questionnaire (S2) with diff

    erent recall periods for diff

    erent groups of items. Since households were assignedrandomly to the two subsamples, the sampling process for each of these Rounds identies the distribution

    12 A different approach would be to estimate a regression of y on v from an auxiliary survey, and then use the estimated

    coefficients, together with observations on v from the target survey, to extrapolate the distribution of y in the latter survey.

    Note, though, that such approach would require the choice of a functional form for the relationship between y and the chosen

    covariates, together with assumptions on the distribution of the error in the estimated regression (see Elbers, Lanjouw, and

    Lanjouw, 2001).13 In all surveys considered here, the correlation coefficient is about 0.8 in the rural sector, and around 0.85 for urban areas.

    5

  • 8/2/2019 Measuring Povert in India - Www.princeton

    6/51

    of all variables for both questionnaire types. On one hand this allows one to test formally whether the

    existing differences in the questionnaire affect the distribution of the variables that we want to include in

    v. On the other hand, for each of these surveys, one can compare the headcount ratio estimated using our

    procedure from S2 (whose data on PCE are non-comparable with those recorded in a standard survey) with

    a benchmark represented by the poverty count estimated through standard methods using PCE from S1.The latter estimate of the poverty count can be used as benchmark since it is computed using observations

    recorded using a standard questionnaire, so that the result should be comparable with the poverty counts

    estimated using the previous quinquennial surveys.

    As we mentioned before, our procedure requires the use of an auxiliary survey in which both y and v are

    recorded in a standard questionnaire. In our experiment this role is played by the 50 th NSS Round, the last

    quinquennial survey run before the 55th Round. This allows us to evaluate whether the assumptions become

    less plausible, and/or whether the performance of the estimator worsens, when the survey for which we want

    to estimate the poverty count (the target survey) becomes more distant in time from the auxiliary survey.

    Our results are encouraging. The hypothesis that the distribution of the covariates in v is the same

    across the two different schedules is almost never rejected. Also, the evidence suggests that the distribution

    of y given v is rather stable over time. While using data on PCE recorded with the experimental set of

    recall periods would produce poverty counts generally fty percent lower than the benchmark, our estimator

    produces results that differ from the latter by about 10 percent in the rural sector, and even less in the urban

    sector. Even if this does not imply that our estimator will give sensible results once applied to data from

    the 55th Round, it does suggest that it will do so.

    When we adjust poverty counts in the 55th Round using our procedure, our results show that the official

    gures overstate (as expected) the reduction in poverty in the rural sector, while the corresponding countsfor urban areas are left virtually unchanged. Even in the rural sector, though, our headcount ratios are only

    about 1.5 percentage points higher than the official ones. This suggests that the effect of the questionnaire

    change on the pattern of 30-day reports for food is not large. These should be welcomed as good news,

    since the reduction in poverty in the nineties remains impressively large even after the adjustment that we

    propose.

    The paper is organized as follows: the next section provides further information about the National

    Sample Survey, and its importance in the evaluation of poverty in India. In Section 3 we describe formally

    the theoretical problem we face, and lay out the notation we will use throughout the rest of the paper.

    Section 4 describes the estimators we use (with the asymptotic properties studied in the appendix). Section

    5 describes the empirical experiment we use to evaluate the credibility of the assumptions needed for our

    procedure to work, and to evaluate its performance. In Section 6 we apply our adjusted procedure to the

    55th NSS Round, and in Section 7 we conclude.

    6

  • 8/2/2019 Measuring Povert in India - Www.princeton

    7/51

    2 The National Sample Survey and its Use in Poverty Estimation

    Since the early seventies, the Planning Commission has regularly published official poverty estimates

    separately for rural and urban areas of every Indian State and Union Territory. These estimates are based

    on the large household consumption surveys run by the National Sample Survey Organization approximately

    every ve years. At the time of writing official poverty estimates are available for 1973-74 (28th NSS Round),

    1977-78 (32nd), 1983-84 (38th), 1987-88 (43rd), 1993-94 (50th), and 1999-2000 (55th). The NSS collects

    information on a wide spectrum of variables, but a large section of the questionnaire is devoted to record

    household consumption of an exhaustive and detailed list of items.14 All individuals living in households

    where the total consumption per person is below the poverty line are then counted as poor, and no adjustment

    is made to take into account family composition. Note that when we compute PCE we always consider a

    measure of total household expenditure that, besides purchased items, also includes commodities received

    in exchange of goods and services, home-produced items, goods received as gifts, loans or charities, and free

    collection.

    The original poverty lines have been computed estimating calorie Engel curves in the early seventies,

    using NSS data from the 28th Round. The lines represent the PCE associated, on average, with a sector-

    specic minimum calorie intake recommended by the Indian National Institute of Nutrition.15 These lines

    are price-inated using the Consumer Price Index for Agricultural Labourers (CPIAL) for rural areas, and

    the Consumer Price Index for Industrial Workers for the urban sector.

    Between quinquennial Rounds, smaller surveys are run almost every year, but their main focus is generally

    different from consumption. With a very few exceptions, NSS surveys are carried out over a one year period,

    and the interviews are conducted evenly across four quarters.16

    All NSS surveys are characterized by a complex survey design, which has not always remained unchanged

    across different Rounds, even if all surveys are stratied and clustered. The whole population is rst divided

    into separate strata, typically geographically dened, and then a number of primary stage units (villages in

    the rural sector, and blocks in urban areas) are selected from every stratum. Once a primary stage unit (PSU

    hereafter) is included in the sample, a xed number of households are selected from it. The NSS sampling

    scheme is such that every household does not have, ex-ante, the same probability of being selected, so ination

    factors (or probability weights), are necessary to recover population estimates.17 The complex survey design

    also has consequences for the standard errors of most statistics of interest: stratication, making all possible

    samples more homogeneous, tends to decrease the variability of the estimates, while clustering typically

    increases considerably the standard errors. The latter result is due to the fact that several households are

    14 The item list is very detailed. There are, for example, approximately 200 different food items.15 See GOI, Planning Commission (1993).16 The larger surveys are designed to give representative pictures of consumption patterns in every state, sector (rural or

    urban) and sub-Round (the quarter in which the interview takes place).17 If weights are not used, households with a lower probability of selection would be underrepresented. For a short textbook

    treatment of sampling weights see Deaton (1997, Ch. 1).

    7

  • 8/2/2019 Measuring Povert in India - Www.princeton

    8/51

    selected from the same PSU, and variables collected in the same PSU generally display positive correlation.18

    Besides the changes in the consumption questionnaire that we already outlined above, the smaller surveys

    we use in our empirical experiment partly differ in their sampling scheme. In particular, even if consumption

    data were collected in all smaller surveys, the 51st (July 1994 - June 1995), 53rd (January - December 1997)

    and 54th

    Round (January - June 1998) were designed to have a major focus on enterprises, so that the framefrom which PSUs have been selected is not a census of households, but rather a census of enterprises. For

    this reason, it is not obvious that these three surveys on one side, and the 50 th, 52nd (July 1995 - June 1996),

    and 55th Round on the other side are representative of the same population considered at different points

    in time. We will show, though, that there is no evidence that these differences in frame affect the stability

    of the relation between PCE and per capita expenditure on 30-day items, which is crucial for our procedure

    to work. Since the 54th Round was run over a six-months period, while all the other surveys are carried out

    over a one year time span, we do not use this Round, to avoid seasonality issues.

    3 The Theoretical Problem

    In this section, we describe the theoretical issues involved in the comparison of poverty counts across dif-

    ferently designed surveys, when the differences in design cause systematic discrepancies in reports between

    the two surveys. Then we show how, and under what conditions, such incomparability can be solved using

    other variables, whose distribution has not been affected by changes in the survey design. More generally,

    we also show how analogous procedures can be used in order to estimate comparable densities of variables

    measured differently due to changes in the survey design.Let z be the poverty line, so that all members in household j will be counted as poor if yj is below z,

    where y is the variable chosen by the researcher to discriminate between poor and non-poor households in

    a given population s. In our analysis, yj will be (log) PCE in household j. We are interested in estimating

    the proportion of individuals below the poverty line. Letting f(y | s) be the density of y in population s,

    the headcount ratio Hs can then be written as follows:

    Hs =

    Zz

    f(y | s)dy (3)

    If survey s contains observations on yj, one can simply estimate the poverty count using

    Hs =

    PjS(s) wjnj1 (yj < z )P

    jS(s) wjnj(4)

    18 Clustering reduces the monetary costs of collecting data, since more observations are sampled from the same geographic

    area, but it also typically reduces the amount of information contained in a given sample. Intuitively, and using an extreme

    case, if all households inside the same PSU were identical, having more than one household from each PSU would not increase

    the precision of the estimates. For an intuitive introduction to the effects of stratication and clustering on standard errors,

    see Deaton (1997, Ch. 1), or Howes and Lanjouw (1995). Howes and Lanjow (1998) analyze the effect that taking into account

    sampling design has on standard errors for poverty measures.

    8

  • 8/2/2019 Measuring Povert in India - Www.princeton

    9/51

    where 1 (E) is the indicator function equal to one when event E is true (and zero otherwise), wj is the

    ination factor (or weight) provided in the survey for household j,19 nj is household size, and S(s) is dened

    as the set of indexes related to households sampled from population s. In what follows, we will use the same

    index to denote both the population and the sample drawn from it in a household survey.

    Suppose that we already have estimates of H computed by (4) using a series of surveys run in differentyears, but characterized by the same standard survey design. Suppose also that we want to compute a

    comparable poverty count using data from a new target survey t, but that a change in the survey design is

    such that a different quantity y, and not y, is observed in survey t. So, for example, y is the reported PCE

    with a 30-day recall period for all items, while y is the reported PCE when different recall periods are used

    for different item categories. Since y is not observed in the target survey t, the estimator in (4) is infeasible

    for such survey. The objective is then to estimate Ht without using observations on y from survey t.

    Let j be a binary variable equal to one when household j belongs to the target population t, and

    equal to zero when the household belongs to the auxiliary population a. In what follows we will consider all

    observations (yj,vj, j) as being generated by a data generating process described by a joint distribution

    f(yj,vj, j) . To avoid excessive notation, we will use the same notation f for all marginal and conditional

    distributions derived from f (yj,vj, j) , and we will use differences in the arguments of these distributions

    to differentiate them from one another. Similarly, we use the notation P to denote all probability masses.

    So, for example, f(y | = 0) will be the marginal distribution of y in the auxiliary population a, and

    P(y < z | v, = 1) will be the probability of being poor once v is observed from a household belonging to

    the target population. The y-based headcount ratio for the target population can be rewritten as

    Ht = ZP(y < z | v, = 1)f(v | = 1) dv (5)In general, this quantity is not identied since the change in survey design is such that the target survey t

    only identies P(y < z | v, = 1) and f (v | = 1). Still, Ht can be estimated if the following assumptions

    hold:

    Assumption 1 f(v | = 1) = f (v | = 1)

    Assumption 2 P(y < z | v, = 1) = P(y < z | v, = 0)

    Assumption 1 states that the distribution of the reports for variables included in v is unaffected by the

    changes in the questionnaire. Assumption 2 requires that the probability of being poor is the same in the

    target and in the auxiliary survey once we condition on the vector of variables v. Then, using Assumptions

    1 and 2 one can rewrite equation (5) as follows:

    Ht =

    ZP(y < z | v, = 0)f(v | = 1) dv (6)

    19 The ination factor wj indicates the number of households in the population represented by the j-th household included

    in the sample.

    9

  • 8/2/2019 Measuring Povert in India - Www.princeton

    10/51

    where all terms are now identied.20

    One can think of estimating (6) by numerical integration, once the two terms inside the integral have

    been estimated nonparametrically. Even if the dimension ofv is large, so that the curse of dimensionality

    would not allow the precise estimation of the two terms, precision in the estimate of Ht would be recovered

    through the integration over v. However, one can rewrite (6) in a way which is simpler to estimate, usingsteps similar to those followed by DiNardo, Fortin, and Lemieux (1996). Let P ( = 1) be the unconditional

    probability that a household belongs to the target population, when the household is randomly selected from

    the joint population encompassing both the target and the auxiliary population. Similarly, let P ( = 1 | v)

    be the probability that a household belongs to the target population when the observed household specic

    covariates are equal to v. Then:

    Ht =

    Zf(y < z,v | = 0)

    f (v | = 0)f(v | = 1) dv (7)

    = Zf(v | y < z, = 0)P(y < z | = 0) f(v | = 1)f(v | = 0)

    dv

    = P(y < z | = 0)

    ZP ( = 1 | v) P ( = 0)

    P ( = 0 | v) P ( = 1)f (v | y < z, = 0) dv

    = HaE[R (v) | y < z, = 0] (8)

    where

    R (v) =f(v | = 1)

    f(v | = 0)=

    P ( = 1 | v) P ( = 0)

    P ( = 0 | v) P ( = 1)(9)

    R (v) is the reweighting function analogous to that in DiNardo et al. (1996). Intuitively, in our context,

    this function delivers the poverty count in the target survey reweighting every joint probability P(y < z,v |

    = 0) in (7), taking into account the fact that, by assumption, the two surveys share the same conditional

    probability of being poor, but have potentially different distributions of the conditioning variable v. Note

    that now, in (8), once a consistent estimator for the reweighting function is available, the headcount ratio

    can be simply computed with a univariate nonparametric regression. However, one still has to deal with

    a multivariate nonparametric regression ifv is multidimensional and one is not willing to make parametric

    assumptions on the functional form of the conditional probability P ( = 1 | v), which has to be estimated

    to compute the reweighting function.

    Note that the above framework would also allow the estimation of the marginal density f (y | = 1) ,

    again without the use of information on y from the target survey, when Assumptions 2 and 3 are strengthened

    using f (y | v, ) instead ofP(y < z | v, ). These stronger conditions are actually extremely desirable, since

    they would allow the researcher to estimate not only poverty counts for any poverty line, but also other

    poverty and inequality measures based on the marginal distribution of y.

    The following proposition summarizes all the population-related results identied so far:

    20 Note that its important that the variables in v be not independent on y, or else (6) would mechanically return the headcount

    ratio in population a.

    10

  • 8/2/2019 Measuring Povert in India - Www.princeton

    11/51

    Proposition 1: Suppose that Assumptions 1 and 2 hold. Then the population headcount poverty

    ratio in the target survey t is

    Ht = HaE[R (v) | y < z, = 0] . (10)

    Moreover, when Assumptions 2 is replaced by

    Assumption 2a f(y | v, = 1) = f(y | v, = 0)

    then

    f(y | = 1) = f(y | = 0) E[R (v) | y, = 0] . (11)

    In the context of this paper we will use the fact that for a set of items (fuel and light, miscellaneous goods

    and services, rents, consumer taxes and non-institutional medical expenses) the recall period is equal to the

    thirty days before the interview, in all questionnaire types used in NSS surveys. Then reports on household

    consumption of these items are likely to be left unaffected by the different reference periods introduced for

    other items, which is what we need for Assumption 1 to be satis

    ed.A necessary condition for the validity of Assumptions 2 and 2a is the stability across the auxiliary and

    the target population of the Engel curve linking consumption of 30-day items to total expenditure. Suppose

    that the Engel curve has the form m = (y) + , where (.) is monotone increasing, and is the error

    term (independent ofy) distributed according to a c.d.f. z. Then one can write the conditional distribution

    function of y as:

    P (y | m) = 1z (m ())

    Such conditional distribution will be constant over time as long as (.) and z do not change.21 Note that,

    in principle, movements in relative prices, or demand shocks, can undermine the validity of this assumption,

    which should therefore be pondered carefully. Actually, the smaller experimental surveys carried out after

    1993-94 offer the opportunity to provide indirect evidence about both the stability of the conditional distri-

    bution, and about the irrelevance of the implemented questionnaire changes on the reported consumption

    of 30-day items. In fact, in each of these surveys the sampling process identies both (y, v) (using reports

    from experimental questionnaires) and (y, v) (using reports from standard questionnaires). The issue of the

    credibility of the identifying assumptions will be analyzed in depth in Section 5 below. It must be stressed,

    though, that once one wants to implement our proposed adjustment procedure to estimate poverty counts

    (or densities) from a target survey for which only a non-standard questionnaire has been used, it will be

    no more possible to test the assumptions, which will have merely to be used as conditions necessary for

    identication.

    Since the Engel curve linking m to y might also depend upon household characteristics, we will also

    analyze the results obtained including in the vector v other household specic variables X that might help

    recovering information about the distribution of y. The variables we will use are household size and a set

    21 Note that the stability of the Engel curve does not imply that the budget share of 30-day items has to remain constant

    over time.

    11

  • 8/2/2019 Measuring Povert in India - Www.princeton

    12/51

    of dummies for education of the household head, main economic activity of the household, land holding,

    and whether the household belongs to scheduled castes or tribes. Note nally that, even if our adjustment

    procedure relies on the underlying stability of the Engel curve, it does not require its estimation, so that no

    functional form assumption about its structure will be necessary. The crucial assumption is that the relation

    between y and m is stable, but we do not need to know explicitly what is the form of such relation, as thenext section will make clear.

    4 Estimation

    In this section, we describe the estimators for the poverty count in (10) and for the marginal density in (11).

    We present the analysis of the asymptotic properties of the estimators in the appendix.

    The population poverty count in (7) can be rewritten as

    Ht =

    Zf(y < z,v | = 0) R (v) dv. (12)

    The estimator for the poverty count in the target survey is then

    Ht =X

    jS(a)

    jR (vj) 1 (yj < z ) (13)

    where

    j =wjnjP

    iS(a) wini

    So the js are the individual ination factors normalized so that they sum up to one. The estimator in (13)

    is more easily understood if one consider the simple case where v is univariate and assumes only r values.

    Then (12) becomes

    Ht =Xr

    i=1P (y < z, v = vi | = 0) R (vi)

    where the conditional probability can be estimated using the (weighted) proportion of individuals from the

    auxiliary population living in households whose PCE is below the poverty line, and whose vj is equal to each

    specic value. So:

    P (y < z,vi | = 0) = PjS(a)wjnj1(yj < z, vj = vi)

    PlS(a)

    wlnl

    Then, once an estimate for the reweighting function is available, the estimator for the poverty count becomes

    Ht =

    Pri=1

    PjS(a) wjnj1 (yj < z) 1(vj = vi)R (vi)P

    lS(a) wlnl

    which in turn, rearranging the terms, can be rewritten as (13).

    Note that in (13) the reweighting function has to be estimated in a rst step. By denition (see equation

    (9) above) R (vj) depends upon P (j = 1 | vj) and P (j = 1). This latter (unconditional) probability, which

    12

  • 8/2/2019 Measuring Povert in India - Www.princeton

    13/51

    is the same for all households, represents the proportion of individuals belonging to the target population,

    and can be estimated using the ratio between the estimated size of the target population, and the sum of

    the estimated size of both the target and the auxiliary population. So

    P( = 1) = PjS(t) wjnjPjS(t) wjnj +

    PjS(a) wjnj

    (14)

    Since j is a binary variable, P (j = 1 | vj) is the regression of j on a vector of covariates vj. Such a

    conditional expectation can be estimated using a simple logit or probit model or, if the dimensionality ofvj

    allows it, using nonparametric techniques. In the present context, if one chooses to use a parametric binary

    variable model, a logit model might be preferred to a probit model, since the former describes the probability

    with a distribution with fatter tails than the latter, and this reduces the probability of having P (j = 0 | vj)

    close to zero. In this way, since the estimated conditional probability appears in the denominator of R (vj) ,

    extreme values of such ratio become less likely.22

    The marginal density in (11) can be estimated using the following kernel estimator

    23

    f(y | = 1) =X

    jS(a)

    jR (vj) K

    yj y

    h

    (15)

    where K(.) is the kernel, and h is the bandwidth, that shrinks to zero when the number of observations

    grows to innity. An alternative estimator of the poverty count can be obtained by numerical integration,

    once the marginal density has been evaluated at a grid of points. Integrating (15) and using the change of

    variable = h1 (yj y) the estimator becomes

    Hdt = XjS(a)

    jR (vj)"Z

    yjz

    h

    K() d# (16)where the index d indicates that the estimate uses explicitly the estimated marginal density. Note that the

    lower bound of the integral approaches sign (yj z) when the number of observations goes to innity,

    since h 0. But then, since the kernel K(.) is a density

    limna

    Z

    yiz

    h

    K() d =

    Z

    sign(yjz)

    K() d = 1 (yj < z)

    where na is the number of observations from the auxiliary survey. Then (16) approaches (13) when the

    sample size grows. In our empirical application, we use both (16) and (13), obtaining estimates which di ffer

    only marginally between the two estimators. Note though that using (16) requires the choice of a bandwidth,which is not necessary if one uses (13).

    22 In our empirical application we will show, though, that the choice of the rst step estimator affects only marginally the

    estimates.23 The estimator for the density is completely analogous to eq. 5 in DiNardo et al. (1996).

    13

  • 8/2/2019 Measuring Povert in India - Www.princeton

    14/51

    5 The Estimators at Work: an Experiment

    Our adjustment procedure relies on identifying assumptions that, as such, cannot be directly tested. In our

    context, however, the experimental surveys carried out after the 50th Round can help to indirectly provide

    support for those assumptions. In these four rounds, the sample was randomly split into two subsamples.

    Households in the rst subsample were assigned the standard questionnaire, with a 30-day recall period

    for all items consumed, while the module assigned to households in the second subsample used different

    recall periods for different groups of items. Even in the latter questionnaire type, though, the 30-day recall

    period was kept for fuel and light, miscellaneous consumer goods and services, rents, consumer taxes and

    non-institutional medical expenses. So, in every smaller Round, the sampling process identies both the

    distribution of (y,v), for households in sample S1, and the distribution of (y, v), for households belonging

    to sample S2. This means that we can test Assumption 1, comparing the distribution ofv to that of v in

    each smaller survey. Also, it will be possible to examine the validity of Assumptions 2 and 2a, comparing

    the conditional distribution of y given v in the chosen auxiliary survey to the one estimated using S1 from

    each smaller survey. Finally, since we are interested in estimating p overty counts comparable to those

    one can obtain with the standard questionnaire, the problem of estimating poverty ratios from the 55 th

    Round is analogous to the problem of estimating poverty ratios using only sample S2 in any of the smaller

    surveys.24 But since for all these surveys we also have S1, which identies (y, v) , it will be possible to

    compare the results obtained through our procedure, using S2 and an auxiliary survey, to the benchmarks

    straightforwardly derived using y from S1 in the same survey.

    In what follows, we use the 50th Round as the auxiliary survey a for all experiments. In turn, we use

    the next three Rounds as target surveys t. We do not use the 54th Round since this was a six-month survey,

    while all other databases were collected during a one-year period. We deliberately keep the same auxiliary

    survey while considering target surveys separated from a by progressively longer time spans, since we are

    interested in verifying whether increasing the time span worsens the performance of our estimator and/or

    weakens the credibility of the assumptions needed for its use. One might worry, for example, that changes

    in preferences, or in the relative price of m (the logarithm of per capita expenditure on 30-day items), might

    produce changes in the conditional distribution of y given m. We do not nd evidence that this is a problem

    with the surveys we use.

    In order to express all values involved into the same monetary units, we deate values from the smaller

    Rounds using state-sector specic official price indexes. For the rural sector, we use the Consumer Price

    Index for Agricultural Laborer (CPIAL), while for urban areas we use the CPI for Industrial Workers. 25 The

    poverty lines we use when computing poverty counts are the all-India official ones for 1993-94 (205.7 Rupees

    24 Note that the problem is analogous, but not identical, since the questionnaire in the 55th Round was not identical to S2 in

    the smaller surveys.25 These indexes are available through a number of publications issued by the Government of India (for example the Statistical

    Abstract).

    14

  • 8/2/2019 Measuring Povert in India - Www.princeton

    15/51

    per head per month for the rural sector, and 283.4 Rupees for urban areas). In our analysis, we include

    only households living in states for which both the price indexes and the official poverty lines are available:

    these states, which account for about 95 percent of the total Indian population, are Andhra Pradesh, Assam,

    Bihar, Gujarat, Karnataka, Kerala, Madhya Pradesh, Maharashtra, Orissa, Punjab, Haryana (urban only),

    Rajasthan, Tamil Nadu, Uttar Pradesh, West Bengal, and urban Delhi.In what follows, we rst test the validity of Assumptions 1, 2, and 2a, and then we apply the reweighting

    procedure to the experimental problem described above.

    5.1 Testing the Assumptions

    In the present context, Assumption 1 requires that a change in the recall period for some consumption items

    does not affect the pattern of reports for the variables we include in v. The crucial variable to be included

    in v is per capita expenditure on 30-day items, which are purchased by almost all households and which

    constitute an important part of the total budget. Table 1 shows that in the smaller Rounds all households

    purchased 30-day items, which account for about 20 percent of the budget in the rural sector, and 25 percent

    in the urban sector (using S1). Table 1 also shows that the systematic difference in average total PCE across

    questionnaire types is not reected in an analogous difference once we look at items for which the recall

    period was the same in both S1 and S2.26

    In Figure 1 we draw, for every smaller survey and sector, the densities of m, m, y, and y, where it

    should be remembered that variables with tildes are related to observations obtained from the experimental

    questionnaires. All graphs represent kernel estimates obtained using the robust bandwidth proposed by

    Silverman (1986) for the estimation of approximately normal densities.27 It is apparent that in all cases

    the distribution of y (estimated using sample S2) lies to the right of the distribution of y (estimated using

    sample S1). At the same time, there is no such large and systematic gap between the distributions of m

    and m. Only in the urban sector of the 51st Round does the difference between the two curves appear not

    negligible, but even in this case the two densities closely coincide for low values of expenditure, which is the

    relevant range when one is interested in poverty counts.

    Any formal test for the equality of f(m) and f(m) should take into account the complex survey design,

    which can considerably affect the variance of the estimates. We construct a simple test by rst dividing the

    range ofm into bins of equal length, and then testing the null hypothesis that the proportion of observations

    in every bin is the same across the two schedules. The test is based on a Pearson 2 statistic modied to take

    into account the presence of strata and clusters. Under the null hypothesis, once we tabulate observations

    across bins and schedules, the proportion of observations in the cell related to the the i-th bin and the k-th

    26 Deaton (2001) studies in more detail the differences in average per capita consumption for different item categories, estimated

    using S1 and S2.27 The bandwidth is adapted for a biweight kernel, which is the one we use here.

    15

  • 8/2/2019 Measuring Povert in India - Www.princeton

    16/51

    schedule (the joint proportion), should be the same as the product of the total proportion of observations in

    the i-th bin and the total proportion of observations in the k-th schedule (the two marginal proportions).28

    The test rejects the null when a normalized sum of the differences between joints and products of marginals

    is large. Since the differences are normalized dividing them by the joint proportion, or by the product of

    the two marginals, one should be careful to choose the number of bins in such a way that the number ofobservations per cell is large enough. To avoid the presence of cells with very few observations, for every

    Round-sector we divide into bins only the range between the rst and the last percentile of the Round-sector

    distribution of the relevant variable. Table 2 shows the results, for different number of bins. We do not

    consider a large number of bins, again to avoid having only a few observations per cell. In most cases the

    null hypothesis cannot be rejected, and the p-values are above 0.2. The null is never rejected at a ve percent

    signicance level when we use 20 bins. Rejection arises, instead, in the rural sector of the 52nd Round, with

    10 or 15 bins, and in both sectors of the 51st Round with 10 bins. Note, though, that even when the null is

    rejected the p-value is always close to 0.05, so that the null is never rejected if we test using a one percent

    signicance level.

    Besides m, we also consider the inclusion in v of observed household characteristics (grouped into the

    vector X) that are likely to be correlated with y. These are household size and a series of dummies for

    land holdings, education of the household head, main economic activity of the household, and whether the

    household belongs to a scheduled caste or tribe. For all these variables, we use the same modied Pearson

    2 test described above. Again, the null hypothesis is that the distribution of every variable is the same

    for the two schedules. The p-values of each test are reported in Table 3. In most cases the data strongly

    support the null.29 The only exceptions are education of the head in the rural sector of the 53rd Round, and

    main economic activity of the household in the rural sector of the 51st

    and 53rd

    Round. However, even inthese cases the differences in the relative proportions of observations per cell between schedules is not large,

    as we show in the bottom part of Table 3. The overall picture that emerges is that in all the NSS Rounds

    we consider, the equality of the distribution ofv and v holds.

    Assumption 2a is more complicated to test, since it involves the comparison of two conditional distribu-

    tions, rather than two marginals. In particular, the assumption requires that the conditional distribution of

    y given v in the auxiliary survey (here the 50th NSS Round) is the same as the one in the target survey (any

    of the smaller surveys we use). Since y is the (log) PCE reported when a 30-day recall period is used for all

    items, in the target survey the assumption refers to the conditional distribution of the variable as recorded

    in S1.

    The relatively small sample size makes the direct estimation of the relevant conditional distributions

    troublesome, but some evidence on the validity of Assumption 2a can be obtained using the fact that

    the equality of the conditional expectations is a necessary condition for the equality of two conditional

    28 See Stata Reference manuals and references therein for details (Stata Corporation, 2001).29 This is not surprising, since the two subsamples are created randomly, and there is no obvious reason why a change in the

    recall period should affect reports ab out household characteristics.

    16

  • 8/2/2019 Measuring Povert in India - Www.princeton

    17/51

    distributions. Even the estimation of the regressions, though, is infeasible without specic assumptions as

    to their functional forms, or if the dimension of v is not small enough relative to the sample size to allow

    nonparametric estimation. But since the most important conditioning variable is m, we can compare across

    surveys the regressions ofy on m estimated nonparametrically, without imposing any functional form. Figure

    2 shows the resulting locally weighted regressions for all Rounds, for the rural and urban sector separately.30

    The curves look extremely similar across the different surveys. Since the functional form appears strikingly

    linear, we also test for equality of coefficients across different surveys in separate linear regressions. In each

    regression, we include observations from a given sector in the 50th Round, together with observations from

    the same sector in one of the smaller surveys. Let Dj be a dummy variable equal to one when the j-th

    household belongs to the smaller survey. Then each regression has the form

    yj = 0 + 1mj + 0Dj + 1Djmj + uj

    where uj is the error term. We report the results in Table 4. In all regressions, we cannot reject the

    hypothesis that 0 or 1 are equal to zero, but tests for the joint signicance reject the null in most cases.

    In all cases, though, the dummies are very small in absolute value, so that both the constant and the slope

    remain very similar across surveys. To further check the similarity of the conditional distribution of y given

    m, we estimate locally weighted regressions on m of the squared residuals computed from the nonparametric

    regressions shown in Figure 2. These regressions, which estimate the conditional variance of y given m, are

    plotted in Figure 3. Here the differences among the curves are much larger than the differences among the

    conditional expectations in Figure 2. Still, the pattern and the levels are similar in the central part of the

    range, where most observations are located.

    The results are less encouraging if other covariates are added to v. When we include in the linearregressions one or more of the variables in X, in most cases we cannot reject the hypothesis that some of the

    coefficients differ across surveys, and in all cases a joint test strongly rejects the null of equal coefficients.31

    Also, a nonparametric analysis becomes impossible, due to the curse of dimensionality, so that entering all

    variables linearly into each regression becomes more controversial. Since, as we show in the next section,

    including X in v does not change signicantly our estimates, our preferred estimator will be one that uses

    only m, for which the evidence of the validity of Assumption 2a is stronger.

    Assumption 2a is only required when one is interested in estimating the marginal density of y in the

    target survey. For the estimation of poverty counts, the weaker Assumption 2 is needed, requiring that the

    conditional probability of being poor given v is the same in the auxiliary and the target surveys. If we

    consider again a case where v includes only m, we can check the validity of this condition estimating locally30 In a locally weighted regression a local OLS regression is run at every point where the conditional expectation is evaluated

    (see Fan, 1992). The regression is local since at every point we use only observations for which the regressor is inside a

    neighborhood of the point itself, dened by a bandwidth. The observations are weighted by using a kernel, so that observations

    closer to the point have more weight in the regression itself.31 These results for different combinations of regressors are not shown here, and they are available upon request from the

    author.

    17

  • 8/2/2019 Measuring Povert in India - Www.princeton

    18/51

    weighted regressions on m of a dummy equal to one when a household is poor. The resulting curves are

    shown in Figure 4. Each graph displays a sector-specic regression for the 50th Round, and compares it with

    the analogous regression from one of the smaller surveys. In every case, the two lines are very close, even if,

    with the exception of the urban sector in the 53rd Round, the regression for the 50th Round lies everywhere,

    or almost everywhere, slightly above the corresponding line from the auxiliary survey. Note, though, thatno systematic pattern in the way the curves move over time is apparent.

    All curves suggest that they might be well described by a parametric binary variable model, so we run a

    sector-specic logit regression for each target survey, pooling together observations from the target and the

    auxiliary survey, and we test the null hypothesis that the coefficients are the same in the two populations.

    The results are reported in Table 5, and they again support the assumption that the conditional probabilities

    are the same, or that they differ only marginally, in the auxiliary and the target survey. The null hypothesis

    that the coefficients for the dummies and the interaction terms are individually equal to zero is never rejected,

    with the exception of the urban sector when the 53rd Round is the target survey (even in this case the t-

    values are approximately equal to 2, and the coefficients are not large in absolute value). When we test for

    the joint signicance of the interaction and the dummy, the null is strongly rejected only when the target

    survey is the urban sector in the 52nd Round. When the target is the rural sector of the 51st or 52nd Round,

    the null is rejected at the 5 percent signicance level, but not if we use a 1 percent level of signicance.

    The overall picture is encouraging, and our experiment suggests that the assumptions needed for the

    adjustment procedure to work are supported by the data. In the next section we proceed then with our

    experiment, in order to evaluate the performance of the proposed estimators.

    5.2 Results

    In this section, we use Schedule 2 from the 51st, 52nd, and 53rd NSS Round as target surveys, keeping the 50th

    Round as our auxiliary survey. The goal is to use our reweighting procedure to estimate poverty counts and

    the density ofy for each target population, using the data collected through the experimental questionnaires

    in the smaller surveys, and to compare these results with those one would obtain using standard estimators

    applied directly to the data recorded in the standard questionnaires from the same small NSS Rounds. If

    the reweighting procedure performs well, the two sets of results should be close to each other, since the fact

    that the two schedule types were assigned randomly should guarantee that both subsamples represent the

    same population.

    We analyze rst the whole density of y, and then we turn to the computation of poverty counts. Most of

    our analysis will be done including only m (the logarithm of per capita expenditure on 30-day items) into

    the vector of covariates v. The use of other household-specic covariates makes the validity of Assumption

    2 more dubious, and using a univariate v also allows us to compare the results we obtain using a parametric

    18

  • 8/2/2019 Measuring Povert in India - Www.princeton

    19/51

    rst step with those we obtain when the reweighting function is estimated nonparametrically. We will also

    show that including other variables in v does not change the results substantially.

    As a rst step to compute the reweighting function, we have to estimate P(j = 1), which is the weighted

    proportion of individuals from the target survey (see equation 14), and P(j = 1 | vj).32 Ifv is a vector

    of discrete variables, the latter estimate is the fraction of individuals belonging to the target survey, giventhe set of those for which the vector of covariates assumes value equal to vj. Since in our case the most

    important component of the vector vj is a continuous variable, namely mj, we can more generally interpret

    P(j = 1 | vj) as the estimated probability that a household belongs to the target survey once the researcher

    observes vj. Such probability can be estimated regressing the binary variable j on vj, using a logit or probit

    model or, if the dimension ofvj is relatively small, using a nonparametric regression. Once we have estimated

    P(j = 1) and P(j = 1 | vj), we calculate the reweighting function R (vj) using (9).

    When we estimate P(j = 1 | vj) with a parametric regression, we always enter mj (and household size

    when it is included in vj) in polynomial form. This is important, since both the logit and probit models

    imply monotonicity in the relation between the estimated probability and each regressor. Since here the

    dependent variable is a binary variable equal to one when the household belongs to the target population,

    we have no reason to assume a priori that the relation between P(j = 1 | vj) and mj is monotonic. In

    other words, we cannot rule out that both low and high values of mj are more common in one of the two

    surveys, with the values in the central range more common in the other. This would be the case, for example,

    if mj had an approximately normal distribution, both in the target and in the auxiliary survey, with the

    same mean but different variances. Using a polynomial allows us to relax this monotonicity implication.

    Note also that the same concern is not relevant when we enter household-specic variables represented by

    categorical variables, since each possible value is entered into the regression as a separate dummy variableinto the regression.

    For illustrative purposes, Figures 5 and 6 show the estimated curves when we use S2 from the 52nd

    Round as the target survey, for the rural and urban sector respectively. 33 The graphs to the left refer to

    the estimated conditional probability P(j = 1 | vj), plotted against m, while those to the right are the

    resulting reweighting functions, plotted with y on the horizontal axis. The graphs in the rst row show the

    results when we estimate P(j = 1 | vj) with a logit model, using only a polynomial in m as regressor. The

    two graphs below refer to the case where we also enter into vj a polynomial in household size, and dummy

    variables for education of the head, land holdings, scheduled caste or tribe, and main economic activity of

    the household. The nal row shows the graphs when we estimate the rst step nonparametrically, using a

    locally weighted regression ofj on mj. If the two populations analyzed are identical, knowing vj would not

    provide any information about which population the selected household comes from, so that the conditional

    probabilities shown to the left would all be approximately at at 0.5, and the reweighting functions to the

    32 Remember that j is dened as a dummy variable equal to one when household j belongs to the target population.33 The analogous graphs for the 51st and 53rd round are available upon request from the author.

    19

  • 8/2/2019 Measuring Povert in India - Www.princeton

    20/51

    right would be approximately at at 1. This is actually what happens for most values in the urban sector

    of the 52nd Round, when we use a polynomial in m, or a nonparametric regression in the rst stage. The

    curves are at, and upward sloping only for large values ofm. In the rural sector, instead, low values of m are

    generally associated with a probability of belonging to the target population above 0.5, and vice-versa. These

    results are easily explained looking at Figure 7, where we plot the cumulative distribution functions of mfrom both the target and the auxiliary survey, in the rural and urban sector separately. The top panel shows

    that in the rural sector the distribution from the 50th Round stochastically dominates the distribution from

    the 52nd Round, so that higher values of m have a higher probability in the older survey. The two curves in

    the urban sector are instead almost identical, but the distribution from the 50th Round lies generally above

    the other for high values of m, which explains why the conditional probabilities in Figure 6 are upward

    sloping at high values of m.

    Once we have estimated the reweighting functions, we can plug them into equation (15) to estimate the

    density ofy from S2 over a grid of points. Figure 8 shows the resulting densities for all target surveys. Each

    graph shows standard kernel estimates of the densities of y from S1 and y from S2 (the dashed and dotted

    line respectively), together with the density of y from sample S2 estimated using our reweighting procedure

    (the continuous line). The results are encouraging, since it is apparent that in all cases the adjusted density is

    much closer to the density estimated using sample S1 (which is our benchmark) than the one estimated using

    S2. However, even if part of the distance between the adjusted density and the benchmark might be due to

    sampling error, it can be noticed that the gap between the two is not negligible in some cases, especially in

    the rural sector, where our procedure produces densities that generally lie slightly to the left of the densities

    from S1, so that in the rural sector we can expect that our adjustment will slightly overstate poverty. This

    is reected in Figure 9, where we show differences between the estimated cumulative distribution functionsof the logarithm of PCE, computed by numerical integration of the densities shown in Figure 8. In each

    graph, the line farther away from zero represents the distance between the cumulative distribution function

    of y (from S2) and the benchmark c.d.f. of y from sample S1. The other line shows the difference between

    the adjusted c.d.f. of y from S2 and the benchmark c.d.f. from S1. We are particularly interested in the

    distances evaluated at the poverty lines, which we denoted by vertical lines in the graphs. While in the

    urban sector the distances are generally very small, this is not always the case in the rural sector, where

    the distance evaluated around the poverty line is in all cases approximately ve percent. Still, our estimator

    provides estimates that are much closer to the benchmark than those obtained from sample S2.34

    Table 6 reports all the point estimates for the poverty counts. Columns 1 and 2 contain the headcount

    poverty ratios straightforwardly calculated using (3), as proportions of individuals living in households with

    PCE below the relevant poverty line. Column 1 shows the results one gets using households belonging to

    34 All graphs represented in Figures 8 and 9 appear extremely similar if the reweighting function is estimated with a non-

    parametric rst step, if we include all household-specic covariates in the logit model, or if we use a probit instead of a logit

    model.

    20

  • 8/2/2019 Measuring Povert in India - Www.princeton

    21/51

    sample S1, and column 2 shows the much lower estimates one obtains using reports from households who have

    been given the experimental questionnaire.35 The next four columns report the poverty counts estimated

    using our reweighting procedure, when we use a logit model in the rst step. In columns 3 and 4, the poverty

    counts are computed using (13), while in columns 5 and 6 we compute them by numerical integration, once

    we have estimated the density ofy over a grid of points using (15). The estimation of the reweighting functionis done including only a polynomial in m in columns 3 and 5, and including also a polynomial in household

    size and dummies for the other household-specic covariates, in columns 4 and 6.36 Finally, the last two

    columns contain the point estimates when the reweighting function is estimated nonparametrically, using a

    locally weighted regression. Our estimates are much closer to the poverty counts reported in column 1 (our

    benchmark) than those in column 2, which are considerably lower, as a consequence of the experimental

    design of the questionnaire given to the households in sample S2. The performance of our estimator is

    particularly satisfactory in the urban sector, where the adjusted headcount ratios generally differ from the

    counts reported in column 1 by 1-2 percentage points. In the rural sector, the performance is not as good,

    and in all cases our estimator overstates poverty by 4-5 percentage points, as was also anticipated by the

    graphs in Figure 9. Note that it is apparent that the results in columns 3 to 9 are very close to each other,

    and no estimator systematically outperforms the others. Also, it is interesting to note that the performance

    of the estimators does not worsen when the distance between the target and the auxiliary survey increases.

    This provides further support for Assumption 2a, which requires the constancy of the distribution of y given

    v across the target and the auxiliary survey. If, for example, the relation between y and m changed quickly

    due to changes in relative prices or preferences, one would have expected the performance of the estimator

    to worsen once we consider target surveys less close in time to the auxiliary survey.

    6 Adjusted Poverty Estimates from the 55th Round of the Na-

    tional Sample Survey

    Data from the 50th Round of the National Sample Survey, carried out in 1993-94, showed that about a third

    of the Indian population was living with a PCE below the poverty line. In rural areas the proportion was

    37.2 percent, while 32.6 percent of the urban population was estimated to be poor. After only six years, the

    official poverty counts estimated using the 55th Round show an impressive reduction in poverty. Using the

    consumption levels reported with a 30-day recall period for food, poverty counts are now approximately 10

    35 These are the same gures reported in Table 1. The standard errors in columns 1 and 2 are computed using the survey

    commands in Stata, which allow to compute standard errors adjusted for stati cation and clustering.36 We estimated the standard errors in columns 3 and 4 using 50 bootstrap replication, taking into account the clustered survey

    design. This means that in each replication the resampling is done with respect to the clusters, and not to the households.

    Once a cluster is included in a bootstrap sample, all households in the cluster are included in the sample (see Deaton, 1997,

    Ch. 1).

    21

  • 8/2/2019 Measuring Povert in India - Www.princeton

    22/51

    percentage points lower, at 27.1 in the rural sector, and 23.6 in urban areas. Taking into account only the

    major Indian states, which are those we will consider in our analysis, we estimate that the unadjusted poverty

    counts based on a 30-day recall are respectively equal to 28.4 and 24.5. In a country where the population has

    recently surpassed one billion of individuals, these numbers would translate into approximately one hundred

    million people out of poverty in less than a decade.In the previous pages, though, we have described the reasons why we believe that these gures are likely

    to underestimate poverty: in reporting household consumption of food on a 30-day basis, the respondent will

    probably reconcile her answers with those given with a 7-day recall period, which are typically proportionally

    higher. Since food, in India, still account on average for a very large part of household budgets, we should

    expect PCE with a 30-day recall period from the 55th Round to be higher than the one we would have

    observed if the standard recall period had been used in isolation. The reweighting procedure described

    in the paper should help recovering poverty counts comparable with those computed from previous NSS

    Rounds.

    Table 7 contains the sector-specic adjusted estimates for the poverty counts, when we use only m as

    conditioning variable. As a robustness check, we use four different auxiliary surveys, all the NSS surveys

    between the 50th and the 53rd Round. As usual, all values are deated using the official sector and state

    specic price indexes. Column 1 presents the headcounts obtained when the rst step is estimated by a logit

    model, while in column 2 we use a locally weighted regression. It is apparent that the choice of the rst step

    estimator is basically immaterial. The choice of the auxiliary survey, instead, does produce different results,

    even if the discrepancies are not large, and in both sectors all estimates are no more than approximately

    2 percentage points apart. Even after the adjustment, the poverty counts remain well below their levels

    as measured in the 50th

    Round. The adjusted gures for the rural sector are, as expected, larger thanthe unadjusted ones, but the gap is very small, and equal to 1-2 percentage points, depending upon which

    auxiliary survey is used. It is interesting to note, instead, is that our adjustment does not revise upwards

    the estimates for urban areas. These results raise some questions, since in the smaller experimental surveys

    carried out between the 50th and the 55th NSS Round the discrepancies in reports due to differences in

    the questionnaire between S1 and S2 were very similar in the rural and urban sector. It is unclear, then,

    why the adjustment works differently in the two sectors. One possibility is that the questionnaire changes

    implemented in the 55th Round affected differently the reports from urban and rural dwellers. In particular,

    the latest survey collected data on consumption out of home production differently, and used a 365-day

    recall period for durables in both sets of reference periods. While home production is clearly more important

    in rural areas, consumption of durables is more relevant in the urban sector. If, and how, these issues

    might explain the observed differences in the adjustment should be the object of further study. In any case,

    though, our results strongly suggest that the change in the questionnaire affected mostly the 7-day reports,

    and not much the standard 30-day reports. In fact, while the adjusted poverty counts are very similar to

    the unadjusted ones, it is still true that the gap between the two sets of reports is much lower than the one

    22

  • 8/2/2019 Measuring Povert in India - Www.princeton

    23/51

    observed in each of the smaller experimental surveys. Actually, since our empirical experiment show that

    in the rural sector the adjustment tends to moderately overestimate poverty, it is possible that, even in the

    rural sector, the unadjusted gures are comparable with the poverty counts computed from previous surveys.

    Note, nally, that even if this were the case, the usefulness of our procedure in the present context would

    remain unaltered, since without our analysis it would have been impossible to judge the comparability ofthe estimates.

    7 Caveats and Conclusions

    In this paper, we have proposed a new method for estimating comparable poverty counts and comparable

    densities of a variable y when such variable is measured differently in different surveys. Our method only

    requires that each survey contains data on a set of variables v, correlated with y, whose reports are not

    affected by the differences in survey design across the two surveys, and that the conditional distribution of y

    given v is the same in the two surveys. Our ultimate objective is to provide tools to estimate poverty counts

    using data from the 55th Round of the Indian National Sample Survey (July 1999 - June 2000), which we

    couldnt otherwise use for this purpose, due to important changes in the survey questionnaire with respect

    to previous NSS Rounds. In the paper we describe why the evidence suggests that the unadjusted official

    poverty counts are likely to understate the extent of poverty.

    We use smaller experimental surveys carried out before the 1999-2000 survey to support the validity of

    the identifying assumptions needed for our estimators, and to evaluate their performance. In our empirical

    experiment the estimators perform well, especially in estimating poverty counts and densities for the urban

    sector.

    Even after adjusting for the changes in the survey module, we observe an impressive reduction in poverty

    in India. Headcount poverty ratios are now about a third lower than in 1993-94, when the previous large

    NSS survey had been carried out. According to our estimates, poverty counts in India is now close to 30

    percent in rural areas, and 25 percent in urban areas.

    It should be stressed that in this paper we do not suggest the superiority of one set of recall period over

    the others. Our scope is to provide a method for estimating comparable poverty counts, from data measured

    in non-comparable way. Our estimator, then, should not be interpreted as a tool to solve a measurement

    error problem, since the variable we are trying to recover, y, might be itself (and surely is) measured with

    error. In fact, in our context y is not the true PCE, but it is instead total PCE reported on a standard

    questionnaire, with a 30-day recall period for all consumption items.

    Since changes in questionnaire are implemented frequently in surveys carried out all over the world, the

    adjustment procedure proposed here can have many applications, besides its use with NSS data. Our esti-

    mators can be useful every time one is interested in comparing distributions of variables recorded differently

    23

  • 8/2/2019 Measuring Povert in India - Www.princeton

    24/51

    in different data sources, if one has reason to believe that the needed assumptions hold true in that specic

    case. Once again, we remind that such assumptions will have to be accepted as conditions necessary to

    achieve identication, which one cannot test. Then, their credibility should be carefully evaluated on a case

    by case basis.

    References

    [1] Bradburn, Norman M., Janellen Huttenlocher, and Larry Hedges, 1994, Telescoping and Temporalmemory in Norbert Schwartz and Seymour Sudman (eds), Autobiographical Memory and the Validityof Retrospective Reports, New York, Springer-Verlag.

    [2] Datt, Gaurav, 1999, Has poverty declined since economic reforms?, Economic and Political Weekly,December 11, pp 351618.

    [3] Deaton, Angus S., 1997, The analysis of household surveys: a microeconometric approach to developmentpolicy, Baltimore, MD, Johns Hopkins University Press.

    [4] Deaton, Angus S., 2001, Survey design and poverty monitoring in India, Research Program in Devel-opment Studies, Princeton University, processed.

    [5] Deaton, Angus S., and Margaret Grosh, 2000, Consumption, Chapter 5 in Margaret Grosh and PaulGlewwe, eds., Designing household survey questionnaires for developing countries: lessons from 15 yearsof the Living Standards Measurement Study, Oxford University Press for the World Bank, Vol 1., pp.91133.

    [6] Deaton, Angus S., and Alessandro Tarozzi, 2000, Prices and poverty in India, Research Program inDevelopment Studies Working Paper no. 196, Princeton University.

    [7] Di Nardo, John, Nicole M. Fortin, and Thomas Lemieux, 1996, Labor Market Institutions and theDistribution of Wages, 1973-1992: a Semiparametric Approach, Econometrica, Vol. 64, 5, 1001-1044.

    [8] Elbers, Chris, Jean O. Lanjouw, and Peter Lanjouw, 2001, Welfare in Villages and Towns: Micro-LevelEstimation of Poverty and Inequality, mimeo.

    [9] Fan, Jianquing, 1992, Design-adaptive Nonparametric Regression, Journal of the American StatisticalAssociation, 87, 998-1004.

    [10] Gibson, John, 1999a, Measuring Poverty with Noisy Estimates of Annual Expenditures: Evidence fromPapua New Guinea, Department of Economics, University of Waikato, 1999.

    [11] Gibson, John, 1999b, How Robust are Poverty Comparisons to Changes in Household Survey Methods?A Test Using Papua New Guinea Data, Department of Economics, University of Waikato.

    [12] Gibson, John, 2001, Measuring chronic poverty without a panel, Journal of Development Economics,65(2), pp. 243-266.

    [13] Government of India, 1988, National Sample Survey, 43rd Round: July 1987-June 1988: Instructions toField Staff: Volume I, Delhi, National Sample Survey Organization.

    [14] Government of India, 1993, Report of the Expert Group on the estimation of proportion and number ofpoor, Delhi, Planning Commission.

    [15] Government of India, 2001, Poverty Estimates for 1999-2000, Press Information Bureau, 22nd February.

    [16] Government of India, various years, Statistical abstract, Delhi, Central Statistical Organization, Depart-ment of Statistics, Ministry of Planning.

    24

  • 8/2/2019 Measuring Povert in India - Www.princeton

    25/51

    [17] Howes, Stephen, and Jean Olson Lanjouw, 1998, Does Sample Design Matter for Poverty Rate Com-parisons?, Review of Income and Wealth, Series 44, no. 1.

    [18] Howes, Stephen, and Jean Olson Lanjouw, 1995, Making Poverty Comparisons Taking Into AccountSurvey Design: How and Why, Policy Research Department, World Bank, Washington, D.C., and YaleUniversity, New Haven, processed.

    [19] Lanjouw, Jean Olson, and Peter Lanjouw, 2001, How to Compare Apples and Oranges: PovertyMeasurement based on Different Denition of Consumption, Review of Income and Wealth, 47 (1), pp.25-42.

    [20] Mahalanbois, P. C., and S. B. Sen, 1954, On some aspects of the Indian National Sample Survey,Bulletin of the International Statistical Institute, Vol. 34, part II.

    [21] Neter, John and Joseph Waksburg, 1964, A Study of Response Errors in Expenditure Data fromHousehold Interviews, Journal of the American Statistical Association, Vol. 59, No. 305, pp. 18-55.

    [22] Newey, Whitney, and Daniel McFadden, Large Sample Estimation and Hypothesis Testing, in Hand-book of Econometrics, Volume IV, R.F. Engle and D.L. McFadden eds., Elsevier.

    [23] Pagan, Adrian, and Aman Ullah, 1999, Nonparametric Econometrics, Cambridge, Cambridge University

    Press.

    [24] Parzen, Emanuel, 1962, On Estimation of a Probability Density and Mode, Annals of MathematicalStatistics, 33, 1065-1076.

    [25] Ravallion, Martin, 1993, Poverty Comparisons: a Guide to Concepts and Methods, LSMS WorkingPaper 88, Washington, D.C., World Bank.

    [26] Rubin, David C. and Alan D. Baddeley, 1989, Telescoping is not Time Compression: a Model of theDating of Autobiographical Events, Memory and Cognition, 17.

    [27] Sachs, Jeffrey D., Ashutosh Varshney, and Nirupam Bajpai, 1999, India in the Era of Economic Reforms,Oxford, Oxford University Press.

    [28] Sen, Abhijit, 2000, Estimates of Consumer Expenditure and Its Distribution: Statistical Prioritiesafter NSS 55th Round, Economic and Political Weekly, December 16, 2000, 4499-4518.

    [29] Silverman, Bernard W., 1986, Density Estimation for Statistics and Data Analysis, London, Chapmanand Hall.

    [30] Stata Corporation, 2001, Stata Reference Manuals, Release 7, College Station, Stata Press.

    [31] Visaria, Pravin, 2000, Poverty in India during 199498: alternative estimates, Institute for EconomicGrowth, New Delhi, processed.

    25

  • 8/2/2019 Measuring Povert in India - Www.princeton

    26/51

    log(Rup ees), Rur al

    f(m ), 51 , Sch . 1 f(y), 51 , Sch . 1f(m ), 51 , Sch . 2 f(y), 51 , Sch . 2

    1 9

    0

    1

    log(Rup ees), Urb an

    f(m ), 51 , Sch . 1 f(y), 5 1 , Sch . 1f(m ), 51 , Sch . 2 f(y), 5 1 , Sch . 2

    1 9

    0

    1

    log(Rup ees), Rur al

    f(m ), 52 , Sch . 1 f(y), 52 , Sch . 1f(m ), 52 , Sch . 2 f(y), 52 , Sch . 2

    1 9

    0

    1

    log(Rup ees), Urb an

    f(m ), 52 , Sch . 1 f(y), 5 2 , Sch . 1f(m ), 52 , Sch . 2 f(y), 5 2 , Sch . 2

    1 9

    0

    1

    log(Rup ees), Rur al

    f(m ), 53 , Sch . 1 f(y), 53 , Sch . 1f(m ), 53 , Sch . 2 f(y), 53 , Sch . 2

    1 9

    0

    1

    log(Rup ees), Urb an

    f(m ), 53 , Sch . 1 f(y), 5 3 , Sch . 1f(m ), 53 , Sch . 2 f(y), 5 3 , Sch . 2

    1 9

    0

    1

    Figure 1Source: authors computation from NSS - All Indiaf(m) is the density of log(PCE in 30-day items), while f(y) is the density of log(PCE). All values are in 1993-94 Rupees, deflatedusing CPIAL for the rural sector, and CPIIW for the urban sector. All densities are estimated using the robust bandwidthproposed by Silverman (1986) for approximately normal densities, adapted for a biweight kernel, which is the one we use here.

  • 8/2/2019 Measuring Povert in India - Www.princeton

    27/51

    y=g(m)

    LocallyWeightedReg.-Ru

    ral-Sch.

    1

    log(per ca pita exp. in 30-day ite ms )

    NSS 50, r u r a l NSS 51, ru r a l, Sch .1NSS 52, r u r a l, Sch .1 NSS 53, ru r a l, Sch .1

    2.5 5.6

    4.75

    6.75

    y=g(m)

    LocallyWeightedReg.-

    Urban-Sch.

    1

    log(per ca pita exp. in 30-day ite ms )

    NSS 50, u r ba n NSS 51, u r ba n , Sch .1NSS 52, u r ba n , Sch .1 NSS 53, u r ba n , Sch .1

    2.8 6.6

    4.9

    7.4

    Figure 2 Locally weighted regressions of log(PCE) on log(PCE in 30-day items)

    Source: Authors computations from NSS, rounds 50-53, all major Indian states. All lines are drawn only for values of theregressor in the interval between the 1st and the 99th centile of the corresponding distribution, to avoid showing the noisyestimates outside those intervals. We use a fixed bandwidth in every regression. The bandwidth is 0.2 in the rural sector of allsmaller surveys and in the urban sector of the 50th round, 0.15 in the rural sector of the 50th round, and 0.3 in the urban sectorof the smaller surveys.

  • 8/2/2019 Measuring Povert in India - Www.princeton

    28/51

    Locallyweightedregressions,

    Ba

    ndwidth

    =0.5

    Rural sectorlog(m)

    NSS 50 - 93-94 NS S 52 - 95-96NSS 51 - 94-95 NS S 53 - 97

    2.5 5.6

    .072283

    .207077

    Locallyweightedregres

    sions,

    Bandwidth

    =0.5

    Urban sectorlog(m)

    NSS 50 - 93-94 NS S 52 - 95-96NSS 51 - 94-95 NS S 53 - 97

    2.8 6.4

    .058649

    .181788

    Figure 3

    Locally weighted regression of squared residuals from first stage nonparametric regression oflog(PCE) on log(PCE in 30-day items)

    Source: Authors computations from NSS, rounds 50-53, all major Indian states. The vertical lines correspond to the 5th and95th sector specific percentile of the distribution of the regressor, computed pooling together all four surveys. The residualsare those computed from the nonparametric regressions shown in figure 2.

  • 8/2/2019 Measuring Povert in India - Www.princeton

    29/51

    %b

    elo

    wp

    overtyline-Rural

    log(pce in 3 0-da y items)

    NSS 51 NSS 50

    2 .5 6 .5

    0

    .5

    1

    %b

    elowp

    overtyline-Urban

    log(pce in 3 0-da y items)

    NSS 5 1 NSS 50

    2 .5 6 .5

    0

    .5

    1

    %b

    elowp

    overty

    line-Rural

    log(pce in 3 0-da y items)

    NSS 52 NSS 50

    2 .5 6 .5

    0

    .5

    1

    %b

    elowp

    overty

    line-Urban

    log(pce in 3 0-da y items)

    NSS 5 2 NSS 50

    2 .5 6 .5

    0

    .5

    1

    %b

    elowp

    overtyline-Rural

    log(pce in 3 0-da y items)

    NSS 53 NSS 50

    2 .5 6 .5

    0

    .5

    1

    %b

    elowp

    overtyline-Urban

    log(pce in 3 0-da y items)

    NSS 5 3 NSS 50

    2 .5 6 .5

    0

    .5

    1

    Figure 4Conditional probability of being poor given log(PCE in 30-day items)

    Source: authors computations from NSS, rounds 50-53, all major Indian States. Locally weighted regressions. The dependent variableis a dummy variable equal to one when the households per capita monthly expenditure is below the sector-specific poverty line. Thebandwidth is equal to 0.5 for all curves.

  • 8/2/2019 Measuring Povert in India - Www.princeton

    30/51

    Locally Weight ed Regres sion first s tep

    P(targetsurvey|v)

    log(PCE in 30 -da y item s)2 3 4 5 6 7

    0

    .5

    1

    Locally Weight ed Regres sion first s tep

    Reweightingfunction

    log(pce, all items)3 5 7 9

    0

    1

    2

    3

    4

    5

    First step: logit, all covariates

    P(targetsurvey|v)

    log(PCE in 30-day items)2 3 4 5 6 7

    0

    .5

    1

    First step: logit, all covariates

    Reweightingfunction

    log(pce, all items )3 5