calculating the concentration index when income is grouped

7
Journal of Health Economics 29 (2010) 151–157 Contents lists available at ScienceDirect Journal of Health Economics journal homepage: www.elsevier.com/locate/econbase Note Calculating the concentration index when income is grouped Philip Clarke a,, Tom Van Ourti b,c a School of Public Health, The University of Sydney, Edward Ford Building (A27), New South Wales 2006, Australia b Erasmus School of Economics, Erasmus University Rotterdam, PB 1738, 3000 DR Rotterdam, The Netherlands c Tinbergen Institute, Burgemeester Oudlaan 50, 3062 PA Rotterdam, The Netherlands article info Article history: Received 20 January 2009 Received in revised form 11 November 2009 Accepted 16 November 2009 Available online 20 November 2009 JEL classification: C2 D31 I19 Keywords: Concentration index Errors-in-variables Instrumental variables Categorical data First-order correction abstract The problem introduced by grouping income data when measuring socioeconomic inequalities in health (and health care) has been highlighted in a recent study in this journal. We re-examine this issue and show there is a tendency to underestimate the concentration index at an increasing rate when lowering the number of income categories. This tendency arises due to a form of measurement error and we propose two correction methods. Firstly, the use of instrumental variables (IV) can reduce the error within income categories. Secondly, through a simple formula for correction that is based only on the number of groups. We find that the simple correction formula reduces the impact of grouping and always outperforms the IV approach. Use of this correction can substantially improve comparisons of the concentration index both across countries and across time. © 2009 Elsevier B.V. All rights reserved. 1. Introduction The concentration index has become the standard measure to quantify income-related inequalities in health economics (Wagstaff and van Doorslaer, 2000). It can be estimated using grouped/aggregated data or micro-data sets that contain informa- tion on an individual’s income and his/her health (care) status. Micro-datasets are generally preferred to grouped datasets as the We are grateful for comments received from Teresa Bago d’Uva, Hans van Kip- persluis, an anonymous referee, and participants at seminars given at Australian National University, Tilburg University and Erasmus University Rotterdam. We also acknowledge funding from the NETSPAR project “Income, health and work across the life cycle” and thank EUROSTAT for access to the ECHP. Part of this research was undertaken while Tom Van Ourti was a Postdoctoral Fellow of the Netherlands Organisation for Scientific Research – Innovational Research Incentives Scheme – Veni. Philip Clarke is supported by a Sydney University Fellowship. Part of this work was undertaken during a stay at the Melbourne Institute of Applied Economic and Social Research, and Economics RSSS at the Australian National University, the hospitality of which is gratefully acknowledged. The usual caveats apply and all remaining errors are our responsibility. Corresponding author. Tel.: +61 2 9351 5424; fax: +61 2 9351 7420. E-mail addresses: [email protected] (P. Clarke), [email protected] (T. Van Ourti). former result in consistent estimation of the concentration index, since point estimates from grouped datasets ignore information on within group association between income (rank) and health (care) status (Kakwani et al., 1997). But also when income is reported in categories, estimating the concentration index using individual level data neglects this within group association. There are many such examples of inequality studies that have involved grouped data or surveys where the income variable used to rank individuals is reported in categories. These include – among others – Gerdtham et al. (1999) who use Swedish health survey data with an income measure with six categories; van Doorslaer et al. (2000) use Finnish and Danish data with categorical income data; Wagstaff (2002) and Meheus and van Doorslaer (2008) use aggre- gate data in wealth quintiles; Humphries and van Doorslaer (2000) and Wagstaff and van Doorslaer (2004) use Canadian data with income deciles; and van Doorslaer et al. (2006) use Canadian and Australian data with a limited number of income categories. While this paper focuses on the calculation of the concentration index when income is grouped, exactly the same issue arises when this type of inequality measures is calculated with any categorical indi- cator of socioeconomic status, such as education and occupation. As we show later, it follows that the health inequality measure will be influenced by the number of groups. Further it is possible to apply 0167-6296/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jhealeco.2009.11.011

Upload: philip-clarke

Post on 25-Oct-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Calculating the concentration index when income is grouped

N

C

Pa

b

c

a

ARR1AA

JCDI

KCEICF

1

t(gtM

pNatwOVwahr

(

0d

Journal of Health Economics 29 (2010) 151–157

Contents lists available at ScienceDirect

Journal of Health Economics

journa l homepage: www.e lsev ier .com/ locate /econbase

ote

alculating the concentration index when income is grouped�

hilip Clarkea,∗, Tom Van Ourtib,c

School of Public Health, The University of Sydney, Edward Ford Building (A27), New South Wales 2006, AustraliaErasmus School of Economics, Erasmus University Rotterdam, PB 1738, 3000 DR Rotterdam, The NetherlandsTinbergen Institute, Burgemeester Oudlaan 50, 3062 PA Rotterdam, The Netherlands

r t i c l e i n f o

rticle history:eceived 20 January 2009eceived in revised form1 November 2009ccepted 16 November 2009vailable online 20 November 2009

EL classification:231

a b s t r a c t

The problem introduced by grouping income data when measuring socioeconomic inequalities in health(and health care) has been highlighted in a recent study in this journal. We re-examine this issue and showthere is a tendency to underestimate the concentration index at an increasing rate when lowering thenumber of income categories. This tendency arises due to a form of measurement error and we proposetwo correction methods. Firstly, the use of instrumental variables (IV) can reduce the error within incomecategories. Secondly, through a simple formula for correction that is based only on the number of groups.We find that the simple correction formula reduces the impact of grouping and always outperforms the IVapproach. Use of this correction can substantially improve comparisons of the concentration index bothacross countries and across time.

19

eywords:oncentration indexrrors-in-variablesnstrumental variables

© 2009 Elsevier B.V. All rights reserved.

fsws

ategorical datairst-order correction

. Introduction

The concentration index has become the standard measureo quantify income-related inequalities in health economics

Wagstaff and van Doorslaer, 2000). It can be estimated usingrouped/aggregated data or micro-data sets that contain informa-ion on an individual’s income and his/her health (care) status.

icro-datasets are generally preferred to grouped datasets as the

� We are grateful for comments received from Teresa Bago d’Uva, Hans van Kip-ersluis, an anonymous referee, and participants at seminars given at Australianational University, Tilburg University and Erasmus University Rotterdam. We alsocknowledge funding from the NETSPAR project “Income, health and work acrosshe life cycle” and thank EUROSTAT for access to the ECHP. Part of this researchas undertaken while Tom Van Ourti was a Postdoctoral Fellow of the Netherlandsrganisation for Scientific Research – Innovational Research Incentives Scheme –eni. Philip Clarke is supported by a Sydney University Fellowship. Part of thisork was undertaken during a stay at the Melbourne Institute of Applied Economic

nd Social Research, and Economics RSSS at the Australian National University, theospitality of which is gratefully acknowledged. The usual caveats apply and allemaining errors are our responsibility.∗ Corresponding author. Tel.: +61 2 9351 5424; fax: +61 2 9351 7420.

E-mail addresses: [email protected] (P. Clarke), [email protected]. Van Ourti).

il

itow(WgaiAtwtcwi

167-6296/$ – see front matter © 2009 Elsevier B.V. All rights reserved.oi:10.1016/j.jhealeco.2009.11.011

ormer result in consistent estimation of the concentration index,ince point estimates from grouped datasets ignore information onithin group association between income (rank) and health (care)

tatus (Kakwani et al., 1997). But also when income is reportedn categories, estimating the concentration index using individualevel data neglects this within group association.

There are many such examples of inequality studies that havenvolved grouped data or surveys where the income variable usedo rank individuals is reported in categories. These include – amongthers – Gerdtham et al. (1999) who use Swedish health survey dataith an income measure with six categories; van Doorslaer et al.

2000) use Finnish and Danish data with categorical income data;agstaff (2002) and Meheus and van Doorslaer (2008) use aggre-

ate data in wealth quintiles; Humphries and van Doorslaer (2000)nd Wagstaff and van Doorslaer (2004) use Canadian data withncome deciles; and van Doorslaer et al. (2006) use Canadian andustralian data with a limited number of income categories. While

his paper focuses on the calculation of the concentration index

hen income is grouped, exactly the same issue arises when this

ype of inequality measures is calculated with any categorical indi-ator of socioeconomic status, such as education and occupation. Ase show later, it follows that the health inequality measure will be

nfluenced by the number of groups. Further it is possible to apply

Page 2: Calculating the concentration index when income is grouped

1 Healt

oss

cjiisqci

sfvgbsYlccauwweCPicor

2

2

tsptc

aace

2

w

na

vtmei

epoi

R

wte

locai

R

w

mC

2

w

j

t

iFl(tpicidifHuilt(iWhile the concentration curve illustrated in panel b may be anunusual one it illustrates that there is no a priori guidance on the

52 P. Clarke, T. Van Ourti / Journal of

ur proposed correction methods when categories can be furtherubdivided into additional groups based on levels of socioeconomictatus.

The issue of dealing with income grouping when measuring theoncentration index has been highlighted in a recent study in thisournal by Chen and Roy (2009). However, the focus of this studys confined to calculating potential bounds on the concentrationndex and the implications of existing estimators for efficiency oftatistical inference. To date the broader question of the conse-uences (and solutions) for grouping of the income variable inategories (or using grouped data) for the estimated health inequal-ty measures have not been addressed.

The remainder of the paper is organised as follows. In the nextection, we show that categorising or grouping the data creates aorm of the classical errors-in-variables problem in which an indi-idual’s ranking is measured with error within, but not betweenroups. While the impact of grouping on the Gini coefficient haseen extensively explored in the context of income inequality mea-urement (e.g. see Gastwirth, 1972; Rasche et al., 1980; Lerman anditzhaki, 1989), findings from this literature can only be extrapo-

ated to the concentration index if concentration curves are globallyonvex or concave, which is unlikely to hold. We propose two pro-edures: the first we term the IV approach which involves findingn instrumental variable to reduce the error in ordering individ-als within each of the income categories, and the second, whiche refer to as the overall-correction-approach was recently put for-ard by Van Ourti and Clarke (2008). The third section presents an

mpirical examination of this issue using data from the Europeanommunity Household Panel (ECHP) and the Medical Expenditureanel Survey (MEPS). Using these datasets, we then illustrate thempact of income grouping upon the point estimate of the con-entration index and explore approaches to reducing the influencef grouping. The final section concludes and discusses the widerelevance and applicability of our correction methods.

. Analytical framework

.1. Background

The concentration index is defined as twice the area betweenhe concentration curve and the diagonal. The bounds of this mea-ure are −1 and +1 with a negative (positive) value representingro-poor (pro-rich) inequality.1 Kakwani et al. (1997) have shownhat the concentration index can be calculated using a so-calledonvenient OLS-regression.

Let us start with the situation where we have micro-data at handnd every individual i = 1, . . ., n reports a variable of interest mi suchs health (care) and his/her actual income level yi with yc ≤ yd for< d. It is easy to show that the concentration index of mi, i.e. C(mi),quals ˆ̨ 1 in the underneath ‘convenient’ OLS-regression:

2mi y

�R m̄= ˛0 + ˛1Ri + εi (1)

here m̄ is the average of mi, Ryi

is the fractional rank of yi, �2R =

−1∑n

i=1

(Ryi

− 0.5)2 = (12n2)

−1(n2 − 1) is the variance of Ry

i, εi is

n error term with mean zero, and ˛0, ˛1 are parameters to be

1 Erreygers (2009) has shown that for any variable of interest with a finite upperalue, the bounds of the concentration index need not be −1 and +1. All the results inhis paper also apply to his corrected concentration index, to Wagstaff’s (2005) nor-

alized concentration index, and to the generalized concentration index (Wagstafft al., 1991) since mean health and its lower and upper bound are not affected byncome grouping.

st

t(

tvv

st

h Economics 29 (2010) 151–157

stimated.2 It is common practice to use the fractional rank as pro-osed by Lerman and Yitzhaki (1989), i.e. Ry

i= n−1(i− 0.5), but in

rder to allow for individuals having the same actual income level,t is better to use:

yi

= p(yi) + 0.5[q(yi) − p(yi)]n

(2)

here q(yi) =∑n

k=11(yk ≤ yi) and p(yi) =∑n

k=11(yk < yi) equalhe number of individuals having at least income yi, including andxcluding yi, respectively (Van Ourti, 2004; Chen and Roy, 2009).

When micro-data is available but income is only recorded in aimited number of categories, Eq. (1) still applies, but the calculationf the fractional income rank differs: there are K different incomeategories ri such that ri = j if j−1 < yi ≤ j with j = 1, . . ., K and j−1nd j are the bounds of each income category j. The fractionalncome rank then becomes:

yi

= Rrj = q( j−1) + 0.5[q( j) − q( j−1)]n

=∑j−1

k=1nk + 0.5njn

(3)

here nj is the number of individuals in income category j.Finally, in case of grouped data, we can still apply Eq. (3), but

ust replace Eq. (1) by Eq. (4). The grouped data estimator for(mj;K) now equals ˆ̌ 1:

�2RK

mjm̄

√nj = ˇ0

√nj + ˇ1R

rj

√nj + �j (4)

here mj is the average variable of interest within income category

, �2RK

= n−1∑K

j=1nj

(Rrj− 0.5

)2is the variance of Rr

j, �j is an error

erm with mean zero, and ˇ0, ˇ1 are parameters to be estimated.3

The first purpose of this paper is to show how income group-ng could impact on the point estimate of the concentration index.ig. 1 provides some intuition on the effect of grouping. The solidines show a hypothetical Lorenz (panel a) and concentration curvepanel b). Consider the effect of grouping the population intoertiles4: both the Lorenz and concentration curve are now com-osed of straight dotted lines as the within group variation in

ncome or health (care) no longer contributes to the respectiveurve (Lambert, 2001). In case of the Lorenz curve, grouping of thencome variable always leads to an underestimation as the straightotted lines that approximate the Lorenz curve are bound to lie

nside the original curve (see panel a in Fig. 1 in which the dif-erence between the grouped and original curves is shaded grey).owever, with the concentration curve this need not result in annderestimation, see for example panel b in Fig. 1 in which the

nflection in the concentration curve means that the straight lineies both outside and inside the original concentration curve and sohe concentration index based on income grouping will be greateror lesser) depending on the degree to which it compresses inequal-ties above or below the dotted line (error also indicated in grey).

ign or magnitude of income grouping upon the point estimate ofhe concentration index.

2 We note that one should use the population formula of the variance of the frac-ional rank (and not a small-sample adjustment) since OLS is used as an arithmeticand not as a statistical) device.

3 We note that the point estimate of the grouped data estimator equals that ofhe individual level estimator with a limited number of income categories, but theariances will differ since the grouped data estimator neglects the variability of theariable of interest within the income categories.4 For simplicity, we assume that everyone within the first and third tertile has the

ame value for the variable of interest, i.e. income for the Gini and health (care) forhe concentration index. There is only variation within the second tertile.

Page 3: Calculating the concentration index when income is grouped

P. Clarke, T. Van Ourti / Journal of Health Economics 29 (2010) 151–157 153

orical

2

gttahae

pWantgvwadidar

2

iiaasliobp

i

adf

tt(mpgrvsaivciti

2

iwto‘iCfuiusgtcte

Fig. 1. Hypothetical example of the impact of categ

.2. Correcting the impact of income grouping

Chen and Roy (2009) have recently examined the impact ofrouping on the concentration index. Following Gastwirth (1972),hey suggested methods for estimating non-parametric bounds forhis measure. However, this approach which is based on the lowestnd highest potential within-income-group-correlations betweenealth (care) and income mean that the bounds can be very widend so it is difficult to know how they will be employed to informmpirical research.

Here we take a different approach in that we re-interpret theroblem as a classical errors-in-variables problem (for exampleooldridge, 2002, Section 4.4.2) since income grouping is equiv-

lent to error in the ranking variable (Ryi

in Eq. (1)) within (butot between) groups. We explore two approaches to deal withhis issue. Firstly, an IV-approach to remove the impact of incomerouping upon estimates of the concentration index when indi-idual level data are available on potential instruments. Secondlye apply what we term an overall-correction-approach followingprocedure of Van Ourti and Clarke (2008) which was originallyerived to deal with the impact of income grouping upon the Gini

ndex. Contrary to the IV approach it can be applied to both groupedata and micro-data with a limited number of income categoriesnd requires no additional information except on the number (andelative size) of income groups.

.2.1. IV approachIn regard to the IV approach the normal conditions for a good

nstrument must apply: (i) sufficient correlation between thenstrument and Ry

i, and (ii) no correlation between the instrument

nd εi in Eq. (1). When employing the standard IV approach toddress errors-in-variables the first condition can be tested, but theecond must be maintained. However, despite the technical simi-arity, our approach differs from the standard IV approach as theres no measurement error at the level of the income categories, butnly within the categories.5 It follows that neither condition can

e observed within income groups, so we suggest the followingrocedure for deciding on a suitable instrument.

First, the instrument(s) should be defined in such a way thatf we were to use individual level income as an instrument (i.e.

5 In addition, the measurement error is not correlated with the rank of the vari-ble measured with error, which holds since the measurement error is uniformlyistributed and has zero mean within each income category. See also Eq. (5) andootnote 8.

g

ii

ew

income data on the Gini and concentration index.

he unobserved actual income value), our IV estimator would givehe same point estimate as the individual level estimator in Eqs.1) and (2). This can be achieved by making sure that the instru-

ent(s) preserves the ranking across income categories (i.e. alleople in a higher income category continue to be assigned areater rank than individuals in any lower income category), and byanking the individuals within the income categories by ‘anotherariable’ that is correlated with the income rank.6 Second, theign of the correlation between the rank of this ‘other variable’nd the fractional income rank based on the income categoriess used to decide whether we rank the individuals by this ‘otherariable’ in an increasing or decreasing manner within the incomeategories. So for example if the fractional income rank based onncome categories is positively correlated with years of educationhe individuals within each category will be re-ranked in order ofncreasing years of education.

.2.2. Overall-correction-approachSimilarly to the IV approach, the overall-correction-approach

s derived from studying the estimator of the concentration indexithin a measurement error framework. The main difference with

he IV approach is that it is a first-order-correction approach basedn the number (and relative size) of income groups only (hence

first-order’). While this increases the likelihood of some remain-ng second-order bias, it is simple to implement and Van Ourti andlarke (2009) show that it has good Monte Carlo and empirical per-

ormance when applied to the Gini index. It is precisely this reliancepon the number (and relative size) of income groups that makes

t a potential candidate for reducing the impact of income groupingpon the concentration index. The intuition of their approach con-ists of comparing the estimator in Eq. (1) with the one based on arouping of the data in Eq. (4), and by next exploiting the proper-ies of the fractional rank and those of OLS as an arithmetic tool. Forlarity, we first present the procedure – applied to the concentra-ion index – of Van Ourti and Clarke (2008) for income groupings ofqual size, i.e. n = n = . . .= n =n/K, and next generalize for income

1 2 kroups of unequal size.

Let us start from the observation that C(mj;K) differs from C(mi)f the fractional income rank Ry

iis associated with mi within the

ncome groups (see also Fig. 1b). The same insight emerges from

6 Note that if this variable only takes a limited number of values (e.g. years ofducation), one should apply Eq. (3) to this ‘other variable’ to calculate the rankingithin the income categories.

Page 4: Calculating the concentration index when income is grouped

1 Healt

tRt

R

w

tmwt

fttt

C

o

(

fi

r

o“TbsoeutSiTdvImof

rgo

r

i

i

c

l

g∑

C

t

iNau

3

3

e1PSmiO

i(iFPSiEtcR

r

(rta

b

54 P. Clarke, T. Van Ourti / Journal of

he difference between the LHS and RHS of Eqs. (1) and (4).7 TheHS difference is addressed by defining an equation that describeshe measurement error

ji= Ry

i+ ıj

i(5)

here Rjiis the fractional rank of group j, i.e. K−1 (j − 0.5), assigned

o each individual i and ıji

is the measurement error with zeroean.8 The LHS difference is addressed by multiplying throughith the ratio of the variances of the fractional income ranks at

he individual and grouped data level, i.e.

�2R

�2RK

= K2(n2 − 1)n2(K2 − 1)

(6)

After some algebra (consult Van Ourti and Clarke (2008, 2009)or more details), one gets an equation that expresses the concen-ration index estimated from individual level data as a function ofhe concentration index estimated from equally-sized groupings ofhese data, i.e.

(mi) = K2

K2 − 1

[n2 − 1n2

C(mj;K

)− 12n

n∑i=1

ıjiεi

](7)

Assuming n → +∞;K < +∞ (i.e. the number and relative sizef groups is fixed in the population), Eq. (7) reduces to C(mi) =K2 − 1)

−1K2

[C

(mj;K

)− 12cov

(ıji, εi

)], and thus provides a

rst-order-correction term (K2 − 1)−1K2 and an expression for the

emaining second-order bias −12cov(ıji, εi

)K2(K2 − 1)

−1.

Eq. (6) shows that the first-order-correction does only dependn the number of income groups and can be interpreted as agrouped data” adjustment of the variance of the fractional rank.he remaining second-order bias is a function of the covarianceetween the measurement error and the error from Eq. (1); and themaller its value, the better the overall-correction-approach/first-rder-correction performs in empirical applications. Although thexact value and sign of this covariance cannot be known as εi isnobservable, we can make some approximate statements. First,his covariance will be smaller the higher the number of groups K.econd, it will be zero if mi is uniformly distributed over Ry

iwithin

ncome categories since the variance of εi equals zero in this case.hird, Van Ourti and Clarke (2009) also present Monte Carlo evi-ence that suggests that the covariance will be a function of theariance (but not the skewness) of the distribution of mi over Ry

i.

n other words, the covariance is likely to be negative for an asym-

etric unimodal distribution. For example, an extreme long right

r left tail is likely to result in a negative covariance as can be seenrom Eqs. (1) and (5).

7 Strictly speaking, one cannot compare Eqs. (1) and (4) as these are based onespectively n and K observations. Nevertheless it is easy to derive an equation thatives the same OLS point estimate for ˇ1 as the one in Eq. (4), but that is definedn n observations. With the assumption of income groups of equal size, Eq. (4)

educes to [K−2(K2 − 1)][

(6m̄)−1mj]

= ˇ0 + ˇ1

[K−1(j − 0.5)

]+ �j . If we next use

ndividual level data and assign the fractional rank of group j to each individual,

.e. Rji= K−1(j − 0.5), one gets [K−2(K2 − 1)]

[(6m̄)−1mi

]= ˇ0 + ˇ1R

ji+ i , which is

omparable to Eq. (1).8 As explained before in footnote 5, the measurement error ıj

iis not corre-

ated with Rji. This can also be seen from noting that Rj

iequals the average Ry

iof

roup j. In other words,∑

i∈ jıjiRji=

∑i∈ j

(Rji− Ry

i

)Rji= 0, and hence

∑n

i=1ıjiRji=

K

j=1

(∑i∈ jı

jiRji

)= 0.

bs1(

iont(

ta

mt

h Economics 29 (2010) 151–157

The generalisation to groups of unequal size is straightforward:

(mi) = 1

�2RK

[n2 − 112n2

C(mj;K

)− 1n

n∑i=1

ıjiεi

](8)

Eqs. (8) and (7) are similar (for n → +∞;K < +∞ Eq. (8) reduces

o C(mi) =(

12�2RK

)−1[C

(mj;K

)− 12cov

(ıji, εi

)]), except that it

s impossible in Eq. (8) to come up with an exact expression for �2RK

.evertheless, the first-order-correction remains easy to calculate,nd the interpretation of the first- and second-order terms remainsnchanged.

. Empirical application

.1. Overview of data

To explore how categorical income data impacts on the pointstimate of the concentration index, we use one wave of data from5 countries participating in the European Community Householdanel (ECHP) and the 2000 wave of the Medical Expenditure Panelurvey (MEPS) from the United States. Here we only provide a sum-ary of the data as we have reported this in much greater detail

ncluding a full list of summary statistics elsewhere (Clarke and Vanurti, 2009).

The ECHP consists of a representative panel of non-nstitutionalised households in each country. We use the second1995) wave (which was the first to include all health-relatednformation) for 13 EU member states: Austria, Belgium, Denmark,rance, Germany, Greece, Ireland, Italy, Luxembourg, Netherlands,ortugal, Spain and the United Kingdom. In the case of Finland andweden, we use later waves since these countries joined the ECHPn wave 3 (1996) and wave 4 (1997) respectively. The Medicalxpenditure Panel Survey (MEPS) provides nationally represen-ative estimates of health status and health care use for the U.S.ivilian non-institutionalized population (Agency for Healthcareesearch and Quality, 2008).

The key variables for this study are household income, self-eported health and health care use which are defined as follows:

Income: The ECHP and MEPS collect information on disposablei.e. after-tax) household income, which is all net monetary incomeeceived by the household members. The income variable was fur-her divided by the OECD modified equivalence scale in order toccount for household size and composition.9

Self-reported health (SRH): We follow the methodology adoptedy many studies in the income-related health inequality literaturey transforming the ordinal SRH responses onto the cardinal HUIcale (Feeny et al., 1995) which takes values between 0 (death) and(perfect health) by using the approach of Jones and van Doorslaer

2003).10

Health care use variables: Both the ECHP and the MEPS containnformation on health care utilization including: (i) the number

f nights in hospital during the last 12 months (NIGHTS), (ii) theumber of dental visits during the last 12 months (DENT) and (iii)he number of visits to a GP or specialist during the last 12 monthsPHYS).

9 The OECD modified equivalence scale gives a weight of 1.0 to the first adult, 0.5o the second and each subsequent person aged 15 and over, and 0.3 to each childged under 15.10 It should also be noted that self-reported health is a categorical variable whichay impact on the measurement of inequality, but this issue is beyond the scope of

his study. See also footnote 1.

Page 5: Calculating the concentration index when income is grouped

P. Clarke, T. Van Ourti / Journal of Health Economics 29 (2010) 151–157 155

F P andc 10 peG exclu

3g

enoi‘t(

ocaot(laTgti

ot

srt

acftwN

3o

ttivf(wwc

ig. 2. The impact of income grouping on the concentration index based on ECHoncentration indices based on individual micro-data that are insignificant at theermany and Sweden/US for Physician visits. (ii) The following countries have been

.2. Empirical difference between the individual level androuped data estimators of the concentration index

Fig. 2 gives an overview of the impact of income grouping uponstimates of concentration indices of self-reported health, hospitalights, dental visits, and physician visits. We summarize the impactf grouping individual level micro-data into equally sized groups,.e. ‘income-tiles’, by calculating 100{[C(mj,K)/C(mi)] − 1} for 50–2income-tiles’, K = 50,49, . . ., 3,2.11 We do this separately for each ofhe 15 European ECHP countries and the 2000 wave of the MEPSUnited States), and arrive at three main findings.

First, we observe both downward (<0) and upward (>0) impactsf income grouping indicating that the underlying concentrationurves are neither globally convex nor globally concave. Secondnd unsurprisingly, we find a much larger and ‘random’ impactf income grouping across K in those cases where the concentra-ion index based on individual level micro-data was insignificanti.e. p > 0.1). This is the case for PHYS in the MEPS 2000 data (high-ighted in red) and Sweden (highlighted in green) as well as NIGHTSnd SRH in France (highlighted in blue) and DENT for Germany.12

his follows from our ‘relative’ indicator of the impact of incomerouping that gets inflated if the denominator is very small. Third,he pattern of the impact of income grouping across the number ofncome categories K shows a similar shape for different variables

11 The impact of income grouping will in general be similar for income groupingsf unequal size, but one cannot preclude that it deviates for very peculiar shapes ofhe distribution of miover Ry

i.

12 We observe the significance level of the individual level concentration indicesince we create the problem of income grouping. This is not possible for the appliedesearcher using grouped data since she/he only observes the significance level ofhe concentration indices resulting from grouped data.

v

aisfcmi

MEPS data. Note: (i) Highlighted dots refer to the impact of income grouping ofrcent level: these are France for self-reported health and nights, dental visits forded due to lack of data: NIGHTS (Germany); DENT (France); PHYS (France).

nd countries. The impact of income grouping seems only empiri-ally relevant – i.e. dominating the randomness across countries –or 10 or less income groups, and markedly so for 5 or less groups. Inhe extreme case of 2 income groups, the median degree of down-ard underestimation across all countries is 26% for SRH, 30% forIGHTS, 27% for DENT and 20% for PHYS.

.3. Comparative performance of the IV andverall-correction-approach

While the IV approach has the potential to completely removehe impact of grouping of the income variable, in practice there areypically a limited number of potential candidates for constructingnstruments from variables routinely collected in health (care) sur-eys. In the MEPS and ECHP data sets we identified three variablesor constructing instruments: (i) modified OECD equivalence scaleeqscale); (ii) the highest educational degree obtained (educ). Third,e have the age of each individual (age) which should correlateith the fractional income rank given the evidence on the life-

ycle behaviour of incomes. We defined instruments from theseariables.

Fig. 3 shows the degree to which each of the three instrumentsnd the overall correction is able to reduce the impact of group-ng, defined as Bj = 100{[C(mj,K)/CI(mi)] − 1}. Fig. 3a–c reports thesetatistics for the United States using the MEPS from the year 2000

or SRH, NIGHTS and DENT13 and Fig. 3d–g provides a graphi-al summary of the performance across ECHP countries using theedian degree of error across countries. The figures show that the

mpact of the IV approach varies considerably by both instrument

13 Results for PHYS are not shown as C was not significant.

Page 6: Calculating the concentration index when income is grouped

156 P. Clarke, T. Van Ourti / Journal of Health Economics 29 (2010) 151–157

F ctionN lowingn T (Fr

awiteIgaa

tembiceoe

WsrMathe poorer performance of the IV approach reveals that the impactof grouping can only be reduced through the judicious use of instru-ments and that using poor instruments can lead to an increaserather a reduction of the impact of grouping.14 Based on these find-

14 While using several instruments jointly is an advantage of the IV approach, it isnot feasible here due to too high collinearity between these instruments. Alterna-tively, one can replace the single instrument by K variables; each variable equalling

ig. 3. The impact of income grouping on the concentration index and various correotes: (i) Median degree of error reported for ECHP countries in d–g. (ii) The folon-significant concentration index: SRH (France); NIGHTS (France, Germany); DEN

nd the variable of interest. For example, IV(age) results in a down-ard error when applied to NIGHTS in MEPS, but an upward error

n the majority of ECHP countries. While in some cases (i.e. Fig. 3c)he IV approach appears to remove a considerable proportion of therror, the benefits of the IV correction are not universal – in Fig. 3d,V(eqscale) and IV(age) increase the error relatively the originalrouped C. In contrast the overall-correction-approach appears tolways reduce the error relative to the original index when therere low numbers of income groups.

Three conclusions emerge. First, for five or less income groups,he overall correction considerably improves upon the originalrror and removes a major share of the impact of grouping. In theajority of countries, it also improves matters for higher num-

ers of income groups. For example, Fig. 3 shows that it always

mproves upon the original error in the MEPS. Second, the overall-orrection-approach always outperforms the IV approach in thismpirical illustration. Finally, the IV approach improves upon theriginal error in a few instances, but overall it does not remove therror and often even worsens matters compared to the “original C”.

tgoaic

methods using data from the United States (using MEPS 2000) and ECHP countries.countries have been excluded from comparison due to either lack of data or a

ance, Germany); PHYS (France, Sweden, United States).

hile it should be acknowledged that only three variables to con-truct instruments have been tested, most (health) surveys containelatively few variables that could be used with the IV approach.oreover, instrument validity across countries is a related concern

nd is not confirmed in our empirical illustration. More generally,

he instrument for one income category and taking zero for all other income cate-ories. The intuitive reasoning behind this set of instruments is that it allows usingveridentifying restrictions tests, such as the J-statistic of Hansen (1982). Sensitivitynalyses for the MEPS show that this approach works in all cases where the valid-ty of the instruments is confirmed by the J-statistic, but that the added flexibilityomes at the cost of a smaller reduction of the impact of income grouping.

Page 7: Calculating the concentration index when income is grouped

Healt

if

oicgnFolo

4

iitbWitbag1

iwvtrhoiemcmaig

has(paverb

R

A

B

C

C

E

F

G

G

H

H

J

K

L

L

M

R

v

v

V

V

V

W

W

W

W

P. Clarke, T. Van Ourti / Journal of

ngs we cannot recommend the IV approach as a practical solutionor correcting grouped concentration indexes.

Clarke and Van Ourti (2009) report the performance of theverall-correction-approach in correcting re-ranking due to group-ng of income categories. For comparisons across all the ECHPountries for SRH, NIGHTS, DENT and PHYS between 2 and 15 cate-ories the correction improved the ranking 27 occasions, lead too change on 17 occasions and made matters worse 13 times.or the same comparisons across the 10 years of MEPS data theverall-correction-approach improved ranking on 35 occasions,ead to no change on six occasions and made matters worse onlynce.

. Concluding remarks

This paper discusses and illustrates how categorical income datampacts on the point estimate of the concentration index. This issues conceptually different from the impact of income grouping onhe Gini index since the underlying concentration curves need note globally convex/concave, and thus the bias can also be upward.e exploit individual level data on health (care) indicators and

ncome to illustrate the impact of grouping by constructing hypo-hetical income groups. We find an upward impact in some cases,ut an overall tendency to underestimate the concentration indext an increasing rate when lowering the number of income cate-ories, which appear empirically relevant when there are less than0 income groups.

We have proposed two approaches to reduce the impact fromncome grouping that are derived from a measurement error frame-

ork. Firstly an IV approach which involves finding an instrumentalariable to reduce the error in ordering individuals within each ofhe income categories. Secondly we have also put forward the cor-ection factor derived by Van Ourti and Clarke (2008) and termedere the ‘overall-correction-approach’ as it only uses informationn the number (and relative size) of income groups. In addition,t can be applied to both grouped data and micro-data with cat-gorical incomes, while the IV approach can only be used foricro-data with income recorded in categories. Our results indi-

ate that the overall-correction-approach is likely to be the superiorethod for reducing the impact of grouping in most practical

pplications where income is recorded in groups and generallymproves matters particularly when there are 5 or less incomeroups.

Although this paper deals with a specific issue in the field ofealth inequality measurement, we believe the approach has widerpplicability. First, concentration indices have a long history out-ide health economics for analysing distributional issues in taxationsee for example, Lambert, 2001). Second, there are several exam-les in the health economics literature where concentration indices

re calculated with categorical non-income data as the rankingariable such as educational degree or occupational class (see forxample, Burström et al., 2005). Our correction methods may helpeduce the dependence upon the number of categories/groups inoth cases.

W

W

h Economics 29 (2010) 151–157 157

eferences

gency for Healthcare Research and Quality, 2008. Medical ExpenditurePanel Survey. Agency for Healthcare Research and Quality, Rockville, MD,http://www.ahrq.gov/info/customer.htm.

urström, K., Johannesson, M., Diderichsen, F., 2005. Increasing socio-economicinequalities in life expectancy and QALYs in Sweden 1980–1997. Health Eco-nomics 14, 831–850.

hen, Z., Roy, K., 2009. Calculating the concentration index with repetitive values ofindicators of economic welfare. Journal of Health Economics 28 (1), 169–175.

larke, P., Van Ourti, T., 2009. Correcting the bias in the concentration index whenincome is grouped. CEPR Discussion paper 2009-599. Centre for Economic PolicyResearch, Research School of Social Sciences, Australian National University.

rreygers, G., 2009. Correcting the Concentration Index. Journal of Health Economics28 (2), 504–515.

eeny, D., Furlong, W., Boyle, M., Torrance, G., 1995. Multi-attribute health statusclassification systems: Health Utilities Index. Pharmacoeconomics 7, 490–502.

astwirth, J., 1972. Robust estimation of the Lorenz Curve and Gini index. Reviewof Economics and Statistics 54, 306–316.

erdtham, U.-G., Johannesson, M., Lundberg, L., Isacson, D., 1999. A note on validat-ing Wagstaff and van Doorslaer’s health measure in the analysis of inequalitiesin health. Journal of Health Economics 18, 117–124.

ansen, L., 1982. Large sample properties of generalized method of moments esti-mators. Econometrica 50, 1029–1054.

umphries, K.H., van Doorslaer, E., 2000. Income-related health inequality inCanada. Social Science and Medicine 50, 663–671.

ones, A.M., van Doorslaer, E., 2003. Inequalities in self-reported health: validationof a new approach to measurement. Journal of Health Economics 22, 61–87.

akwani, N., Wagstaff, A., van Doorslaer, E., 1997. Socioeconomic inequalities inhealth: measurement, computation and statistical inference. Journal of Econo-metrics 77, 87–103.

ambert, P.J., 2001. The Distribution and Redistribution of Income, third ed. Manch-ester University Press, Manchester.

erman, R.I., Yitzhaki, S., 1989. Improving the accuracy of estimates of Gini coeffi-cients. Journal of Econometrics 42, 43–47.

eheus, F., van Doorslaer, E., 2008. Achieving better measles immunization in devel-oping countries: does higher coverage imply lower inequality? Social Scienceand Medicine 66 (8), 1709–1718.

asche, R.H., Gaffney, J., Koo, A.Y.C., Obst, N., 1980. Functional forms for estimatingthe Lorenz curve. Econometrica 48, 1061–1062.

an Doorslaer, E., Wagstaff, A., van der Burg, H., Christiansen, T., De Graeve, D., Duch-esne, I., Gerdtham, U.-G., Gerfin, M., Geurts, J., Gross, L., Häkkinen, U., John, J.,Klavus, J., Leu, R.E., Nolan, B., O’Donnell, O., Propper, C., Puffer, F., Schellhorn,M., Sundberg, G., Winkelhake, O., 2000. Equity in the delivery of health care inEurope and the US. Journal of Health Economics 19, 553–583.

an Doorslaer, E., Masseria, C., Koolman, X., 2006. Inequalities in access to medicalcare by income in developed countries. Canadian Medical Association Journal174, 177–183.

an Ourti, T., 2004. Measuring horizontal inequity in Belgian health care usinga Gaussian random effects two part count data model. Health Economics 13,705–724.

an Ourti, T. & Clarke, P. 2008. The Bias of the Gini Coefficient due to Grouping,Tinbergen Institute Discussion Papers 08-095/3, Tinbergen Institute.

an Ourti, T., Clarke, P., 2009. A simple correction to remove the bias of the Ginicoefficient due to income grouping. Mimeo.

agstaff, A, 2002. Inequality aversion, health inequalities and health achievement.Journal of Health Economics 21, 627–641.

agstaff, A., 2005. The bounds of the Concentration Index when the variable of inter-est is binary, with an application to immunization inequality. Health Economics14 (4), 429–432.

agstaff, A., van Doorslaer, E., 2000. Equity in health care finance and delivery.In: Culyer, A., Newhouse, J. (Eds.), Handbook of Health Economics. Amsterdam,North Holland, pp. 1804–1862.

agstaff, A., van Doorslaer, E., 2004. Overall versus socioeconomic health

inequality: a measurement framework and two empirical illustrations. HealthEconomics 13, 297–301.

agstaff, A., Paci, P., van Doorslaer, E., 1991. On the measurement of inequalities inhealth. Social Science and Medicine 33 (5), 545–557.

ooldridge, J.M., 2002. Econometric Analysis of Cross section and Panel Data. TheMIT Press, London.