an item response theory analysis of a measure of complicated grief

31
This article was downloaded by: [University of Connecticut] On: 10 October 2014, At: 00:51 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Death Studies Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/udst20 An Item Response Theory Analysis of a Measure of Complicated Grief Paul A. Boelen a & Herbert Hoijtink b a Department of Clinical and Health Psychology , Utrecht University , Utrecht, The Netherlands b Department of Methodology and Statistics , Utrecht University , Utrecht, The Netherlands Published online: 15 Jan 2009. To cite this article: Paul A. Boelen & Herbert Hoijtink (2009) An Item Response Theory Analysis of a Measure of Complicated Grief, Death Studies, 33:2, 101-129, DOI: 10.1080/07481180802602758 To link to this article: http://dx.doi.org/10.1080/07481180802602758 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.

Upload: herbert

Post on 12-Feb-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

This article was downloaded by: [University of Connecticut]On: 10 October 2014, At: 00:51Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH,UK

Death StudiesPublication details, including instructions forauthors and subscription information:http://www.tandfonline.com/loi/udst20

An Item Response TheoryAnalysis of a Measure ofComplicated GriefPaul A. Boelen a & Herbert Hoijtink ba Department of Clinical and Health Psychology ,Utrecht University , Utrecht, The Netherlandsb Department of Methodology and Statistics ,Utrecht University , Utrecht, The NetherlandsPublished online: 15 Jan 2009.

To cite this article: Paul A. Boelen & Herbert Hoijtink (2009) An Item ResponseTheory Analysis of a Measure of Complicated Grief, Death Studies, 33:2, 101-129, DOI:10.1080/07481180802602758

To link to this article: http://dx.doi.org/10.1080/07481180802602758

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all theinformation (the “Content”) contained in the publications on our platform.However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness,or suitability for any purpose of the Content. Any opinions and viewsexpressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of theContent should not be relied upon and should be independently verified withprimary sources of information. Taylor and Francis shall not be liable for anylosses, actions, claims, proceedings, demands, costs, expenses, damages,and other liabilities whatsoever or howsoever caused arising directly orindirectly in connection with, in relation to or arising out of the use of theContent.

This article may be used for research, teaching, and private study purposes.Any substantial or systematic reproduction, redistribution, reselling, loan,sub-licensing, systematic supply, or distribution in any form to anyone isexpressly forbidden. Terms & Conditions of access and use can be found athttp://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

AN ITEM RESPONSE THEORY ANALYSIS OF AMEASURE OF COMPLICATED GRIEF

PAUL A. BOELEN

Department of Clinical and Health Psychology, Utrecht University,Utrecht, The Netherlands

HERBERT HOIJTINK

Department of Methodology and Statistics, Utrecht University,Utrecht, The Netherlands

Item response theory modeling was applied to the data of 1,321 bereavedindividuals who completed the Dutch version of the Inventory of ComplicatedGrief—Revised (ICG–R)—a 29-item self-report measure of complicated grief (CG).The authors aimed to examine the information that each of the ICG–R itemscontributes to the measurement of overall CG severity and to detect possibledifferences between subgroups of mourners in the way CG items are endorsed.Findings showed that items differed in their ability to distinguish individualdifferences in CG severity. Items that discriminated best between low and highCG were ‘‘feeling numb,’’ ‘‘feeling that the future holds no meaning or purpose,’’and ‘‘unable imagining life being fulfilling without the lost person.’’ Items werereasonably well dispersed across the range of CG severity. There was littleevidence of differential item functioning between gender groups and victims ofviolent vs. non-violent loss.

Some people confronted with the loss of a close relative developemotional problems. Frequently observed syndromes includemajor depression, post-traumatic stress disorder, and other anxietydisorders (e.g., Bonanno & Kaltman, 2001). Bereaved individualscan also develop symptoms of complicated grief (CG) that aredistinct from established mood and anxiety disorders and that,

Received 14 December 2007; accepted 2 July 2008.Address correspondence to Paul A. Boelen, Department of Clinical and Health

Psychology, Utrecht University, PO Box 80140, 3508 TC Utrecht, The Netherlands. E-mail:[email protected]

Death Studies, 33: 101–129, 2009Copyright # Taylor & Francis Group, LLCISSN: 0748-1187 print=1091-7683 onlineDOI: 10.1080/07481180802602758

101

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

independent thereof, predict persisting health impairments(Boelen & Prigerson, 2007; Chen et al., 1999; Prigerson et al.,1997; Prigerson et al., 1996; Silverman et al., 2000). Efforts arebeing made toward including CG as a new disorder in the fifthedition of the DSM (Lichtenthal, Cruess, & Prigerson, 2004). Ascurrently defined, CG is a disorder that encompasses grief-specificsymptoms including yearning, disbelief regarding the death, pre-occupation, and recurrent images of the lost person that occur forat least 6 months, to the point of functional impairment (Prigerson,Vanderwerker, & Maciejewski, 2008).1

Several studies have generated evidence in favor of the relia-bility and validity of the proposed CG criteria (Lichtenthal et al.,2004). Although these studies have enhanced clarity on what CGconstitutes, there are some limitations to studies done so far. First,there is uncertainty about the generalizability of some study find-ings. For instance, in the first DSM field trial the ability of indivi-dual CG symptoms to correctly identify cases and non-cases ofCG was examined with receiver operating characteristic (ROC)analyses (Prigerson et al., 1999), with ‘‘cases’’ of CG being definedas those scoring in the upper 20% of the distribution of thesummed symptom scores. Outcomes were likely influenced bythe composition of the sample, consisting of 306 elderly widowedspouses. For example, the item ‘‘difficulty imagining a fulfilling lifewithout the deceased’’ had very low sensitivity and specificity.Although this suggests that this is a weak indicator of CG, it is pos-sible that this item would be strongly linked with CG severity inother groups of mourners such as younger bereaved parents. Ina similar way, the composition of the sample likely influenced out-comes of studies in which CG symptoms were correlated withindices of health and quality of life impairments to examine thedegree of ‘‘maladaptiveness’’ of these symptoms (e.g., Boelen,van den Bout, de Keijser, & Hoijtink, 2003; Prigerson et al.,1995, 1997; Silverman et al., 2000). Thus, the finding of a correla-tion between CG symptoms and a global index of CG severity (asin the 1999 field trial) or other indices of malfunctioning (as in theother studies) informs us about the problematic nature of thesesymptoms. However, what we learn is limited by the degree to

1In recent writings, complicated grief is termed prolonged grief disorder. As the formerterm is more well-known this term is used in the present article.

102 P. A. Boelen and H. Hoijtink

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

which these findings generalize across different subgroups ofmourners divided by variables such as gender and cause of death.For instance, it might well be that CG symptoms are more stronglyassociated with physical morbidity in older than in youngermourners or that the clinical correlates of CG differ across relevantsubgroups of bereaved individuals.

A second limitation to studies on CG done so far is that theoverall severity of CG is typically estimated by summing the scoresof individual items of a CG scale (e.g., Boelen et al., 2003;Prigerson et al., 1995, 1997, 1999; Silverman et al., 2000). Poten-tially problematic about such an approach is that it implies thatitems with varying content perform equally effective as indicatorsof overall CG severity, that CG items always tell us somethingabout an individual’s level of CG independent of what this levelactually is (i.e., that items are uniformly informative at all levelsof CG severity), and that the manner in which CG is expressedis equal across subgroups of mourners. It is unlikely that theseassumptions are all true. Rather, it is conceivable that the strengthof individual CG symptoms as indicators of overall CG severityvaries among items, varies across the entire dimension of CGseverity, and varies across different subgroups of mourners.

Altogether, it is important to enhance knowledge on howexactly individual CG symptoms perform as indicators of theunderlying dimension of overall CG severity. In the current study,performed in the Netherlands, we studied this issue by applyingthe responses of a large group of mourners on the Inventory ofComplicated Grief—Revised (ICG–R)2 to methods from ItemResponse Theory (IRT). IRT offers statistical models to describethe empirical properties of the way in which items that areassumed to make up a dimension are related to that dimension.It provides unique information about a measure’s performancethat cannot be easily obtained through traditional methods basedin classical test theory. For instance, traditional psychometricapproaches offer group-level estimates that do not provide infor-mation about how responses to individual ICG–R items changeacross different levels of CG, the effectiveness of each item indistinguishing between different levels of CG, and the degree to

2The ICG–R has also been called Inventory of Traumatic Grief (ITG; Boelen, van denBout, de Keijser, & Hoijtink, 2003; Prigerson & Jacobs, 2001).

IRT Analysis of a Measure of Complicated Grief 103

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

which items provide overlapping or unique information aboutlevels of overall CG severity. The reason for using the ICG–Rand not one of the other measures designed to assess grief reactionswas that the ICG–R is the single self-report measure that wasdesigned to examine symptoms of CG and other potentially pro-blematic grief reactions (as opposed to normal grief) and is oneof the most well-validated measures of grief reactions (Neimeyer,Hogan, & Laurie, 2008). We will briefly describe some basicfeatures of IRT, focusing on IRT models for dichotomous itemsthat measure a single dimension. Polytomous IRT models willbe described below.

In IRT modeling, statistical models are used to describe therelationship between an underlying latent trait or dimension ofpsychopathology (symbolized as h) and the probability that anitem that is designed to assess the dimension is endorsed in thekeyed direction (i.e., the direction that reflects higher levels ofthe underlying dimension of interest; Embretson & Reise, 2000).The relationship between the person’s position on h and the prob-ability of item endorsement can be plotted in item characteristiccurves (ICCs). Examples of ICCs for three dichotomous itemsare shown in Figure 1. The probability of item endorsement isdetermined by the so-called ‘‘person parameter’’ h but is also a

FIGURE 1 Examples of three item characteristic curves.

104 P. A. Boelen and H. Hoijtink

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

function of one or more ‘‘item parameters.’’ In frequently used twoparameter logistic models item endorsement is assumed to bedetermined by the difficulty or (as it is often called when appliedto psychopathology items) severity of the item (symbolized as b)and the discriminative ability (symbolized as a) of the item. Theseverity is on the same scale as h (in this study: the dimension ofoverall CG severity) and is a quantification of the point on h wherepeople have a 50% chance to endorse the item in the keyed direc-tion. As such, it reflects the item’s location on h. When comparingitems A and B in Figure 1, it can be seen that item B is a more dif-ficult or more severe item than is item A. The item’s discrimination(a) is the slope of the ICC and represents the ability of the item todistinguish between people at contiguous levels of h. As such, itquantifies an item’s strength as an indicator of the dimension thatit is supposed to measure. In Figure 1, item C discriminates lesswell between people high and low on h than items A and B do.

When (apart from a and b) the shape of the ICC is also deter-mined by a characteristic of the person other than h (e.g., gender)the item has a form of bias known as differential item functioning(DIF). DIF has been defined as occurring when two people withthe same h-levels have different probabilities of endorsing an itembecause of their difference in group membership (Hambleton,Swaminathan, & Rogers, 1991). If curves A and B in Figure 1would apply to the scores of women and men respectively onthe same CG item, this would indicate that, with similar levels ofCG, women would have a greater probability of endorsing theitem in the keyed direction than men.

In the current study, IRT was used to investigate how indivi-dual CG items tapped by the ICG–R perform as indicators of howstrongly a person suffers CG, to study the degree of informationthe items provide about CG severity across the entire dimensionof CG severity, and to examine possible differences in the perfor-mance of items across subgroups of mourners. Investigating thesetopics is important for several reasons. First, identifying symptomsthat are informative about a person’s overall CG level is importantfor the development of measurement instruments for CG and forthe choice of symptoms that should be included in diagnostic cri-teria for the disorder. That is, symptoms that are not sensitive todifferences in overall CG severity and, as such, provide little infor-mation about a person’s CG level should perhaps not be included

IRT Analysis of a Measure of Complicated Grief 105

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

in such instruments and criteria. Second, examining what griefreactions are already endorsed at low levels of CG severity andwhich symptoms only occur at very high levels of CG providesobjective information about more or less severe manifestations ofthe CG dimension. Third, examining whether or not ICG–R itemsfunction equally across subgroups of mourners sheds light on thequestion if these groups can be meaningfully compared on theICG–R. It could be possible, for example, that the ICG–R containsmany items that are easier to endorse for women than for men. Ifso, it would be uncertain to what extent gender differences foundin a particular sample would be attributable to valid differencesat the level of the underlying dimension of CG severity or to con-tent features of the items. In the present study, we tested for DIFbetween men and women. In addition, we examined DIF betweenmourners confronted with violent deaths (i.e., due to accident, sui-cide, homicide) versus those confronted with non-violent deaths.Several authors have claimed that grief reactions differ after violentor non-violent deaths (Stroebe, Schut, & Finkenhauer, 2001). Tobe able to reliably study these differences using the ICG–R, it isrelevant to preclude that victims of violent versus non-violentdeath systematically endorse its items differently.

Originally, items of the Dutch ICG–R are scored on 5-pointscales with the categories ‘‘never,’’ ‘‘seldom,’’ ‘‘rarely,’’ ‘‘fre-quently,’’ and ‘‘always’’ (Boelen et al., 2003). In the current study,the five categories were collapsed into 3-point scales. Categories‘‘seldom’’ and ‘‘rarely’’ and categories ‘‘frequently’’ and ‘‘always’’were collapsed into one category, resulting in three categories withlabels 0 (never), 1 (seldom=rarely), and 2 ( frequently=always). We didso because (a) differences between the categories ‘‘seldom’’ and‘‘rarely’’ and between the categories ‘‘frequently’’ and ‘‘always’’ weredeemed ambiguous; (b) combining categories was consideredto enhance statistical power of our DIF analyses (as explainedbelow, when using 5-point scales, we would have had to test 116v2 tests within each series of DIF analyses; i.e., four v2 tests foreach of the 29 items); and (c) collapsing categories enhanced theinterpretability of outcomes.

We used the One Parameter Logistic Model (OPLM), devel-oped by Verhelst and Glas (1995), to model ICG–R items. Inthe next section, we will give a description of the model (see alsoHoijtink & Vollema, 2003).

106 P. A. Boelen and H. Hoijtink

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

OPLM

If the response of person i to item j is denoted by xij, wherei¼ 1, . . . N, and j¼ 1, . . . K, then:

pðxij ¼ mjhiÞ ¼expðajðmhi �

Pmg¼1 bjg ÞÞ

1þPm

h¼1 expðajðhhi �Ph

g¼1 bjg ÞÞ; ð1Þ

where, m¼ 0, 1, 2, denotes the item responses that are possible, aj

denotes the discrimination parameter of item j, hi denotes the loca-tion of person i on the latent trait, bjg denotes the g-th severity para-meter (g¼ 1, 2) of item j, and p(xij¼m jhi) denotes the probability ofresponding m to item j if a person has position hi on the latent traitdimension.3

The OPLM can be visually depicted in the form of CategoryResponse Curves (CRCs). Examples are shown in Figure 2. Curvesare those for ICG–R items 5 and 13, psychometric properties ofwhich are described in more detail below. For trichotomous itemsthere are three CRCs. Figure 2 shows that, as one moves up theCG dimension, the probability of endorsing the category ‘‘never’’decreases whereas the probability of a ‘‘frequently=always’’response increases. The CRC of the ‘‘seldom=rarely’’ response issingle peaked. This reflects that the probability of choosing theoption ‘‘seldom=rarely’’ increases as one moves from the low rangeto the mid-range of the CG dimension, after which it starts todecrease again and the probability of a ‘‘frequently=always’’response become more likely.

Trichotomous items have two severity parameters, bi1 and bi2.These can be found at the points where responses 0 and 1 andresponses 1 and 2 cross respectively. Below bi1 people tend toanswer with 0. Between bi1 and bi2 the option 1 is most probable.Beyond bi2 the option 2 is most probable. The bs can thus also beregarded as between option threshold parameters. (We will use theterms severity parameter and threshold parameter interchangeably.)Figure 2 shows that the thresholds of item 5 are smaller than those

3We use a and b to refer to the discrimination index and threshold parameter, whereasin some other writings about the OPLM (e.g., Glas & Verhelst, 1995) b and d are used torefer to these indices. Yet, because a and b are more commonly used, we chosethese symbols.

IRT Analysis of a Measure of Complicated Grief 107

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

of item 13. This implies that item 5 is indicative of lower levels ofCG than is item 13. The discrimination parameter a influences theslopes of the CRCs. The interpretation of a is similar to that offactor loadings in factor analysis: the greater its value, the greaterthe importance of the item for defining h. Figure 2 shows that item5 has steeper slopes than item 13, indicating that item 5 performsbetter as indicator of overall CG severity.

Parameter estimation in the OPLM is an iterative procedurein two steps.

The first step is the process of ‘‘discrete optimization’’ inwhich the discrimination indices (as) are determined. In theOPLM, values of the as are imputed. Then, M2 tests are conductedto identify misspecified values—or, stated otherwise, to test the fit ofthe as (Verhelst, Glas, & Verstralen, 1995) The M2 test (developedby Verhelst & Glas, 1995) gives an indication of the degree towhich the ICC that is reconstructed with the weighted raw scores(i.e., the ‘‘observed’’ ICC) fits to the ICC that is forwarded by themodel (i.e., the ‘‘predicted’’ ICC based upon the item’s severityand discrimination provided by the model). The normalized ver-sion of the M2 test has a standard normal distribution under the

FIGURE 2 Category Response Curves of items 5 (‘‘yearning’’) and 13 (‘‘avoid-ance’’). The solid curves represent item 5 and the dashed curves represent item 13.

108 P. A. Boelen and H. Hoijtink

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

null hypothesis that the discrimination parameter of item j is cor-rectly specified (Verhelst & Glas, 1995). In the OPLM, conditionalmaximum likelihood (CML) estimation of the as and bs for eachitem is repeated, until all M2 values are reasonable, that is withinthe interval –3 to þ3 (Hoijtink & Vollema, 2003).

The item parameters provided by the OPLM can be used toprovide the test information curve (TIC) for the whole set of items.The TIC represents the relative precision of the item set across theentire h dimension. Stated otherwise, it gives information aboutthe precision with which a person’s position on the h dimensioncan be estimated with the item set, at different points of the dimen-sion. The peak of the TIC is the point where the item set providesthe most precision in estimating h. The height of the TIC is relatedto the standard error of measurement (SEM) of the estimated hs.This SEM can be used to compute a 95% confidence interval (CI)for the estimated hs by adding and subtracted 1.96 times the SEM.

To investigate DIF with the OPLM, the response categories aredichotomized in m�1 ways and ICCs4 of the dichotomized responseoptions are calculated separately for both groups included in the ana-lysis (Glas & Verhelst, 1995; Holland & Wainer, 1993; Verhelst et al.,1995). For example, when examining gender DIF for a trichotomousitems, the ICCs representing the probability of responding 1 or 2 (andnot 0) are calculated separately for men and women, and the ICCsrepresenting the probability of responding 2 (and not 0, 1) are alsocalculated for both groups. Then, the ICCs of the two groups repre-senting both response option dichotomies are compared with a v2

statistic that is part of the OPLM standard output (Glas & Verhelst,1995).5 When the v2 statistic is significant, this points at the presenceof DIF. The direction and theoretical relevance of this statisticallysignificant DIF can subsequently be judged on the basis of plots liketo ones displayed in Figure 3 (discussed later on). These are part of theOPLM standard output.

4The acronym ICC will be used throughout this article to remind the reader that theseprobability functions are calculated from the OPLM. Yet, strictly speaking, these curves arenot item characteristic curves per se because, with polytomous items, several such curvesexist—one for each of the item’s response option dichotomies.

5For each group, this test computes a normalized distance between the observed ICCof that group and the expected ICC. The larger the sum of the normalized distances of bothgroups, the more the item is biased. This test has a v2 distribution, with degrees of freedombeing equal to the number of weighted score groups (for each group z ) with a size of at least30, minus one (see also Glas & Verhelst, 1995; Verhelst et al., 1995).

IRT Analysis of a Measure of Complicated Grief 109

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

The interpretation of DIF with trichotomous items is morecomplex than with dichotomous items. With dichotomous items,the item’s single ICC is compared between groups, whereaswith trichotomous items two ICCs (one for each response option

FIGURE 3 ‘‘ICCs’’ of men and women for the two response option dichotomiesof item 10 (‘‘hard to trust people’’). The upper panel shows the ‘‘ICCs’’ for the firstdichotomy (options 1 and 2 vs. 0). The lower panel shows the ‘‘ICCs’’ for the sec-ond dichotomy (option 2 vs. 0 and 1). Solid lines represent the ‘‘ICCs’’ for thewhole sample.

110 P. A. Boelen and H. Hoijtink

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

dichotomy) are compared. If both ICCs are found to differ thiscan easily be interpreted as indicating that the whole item isbiased. Yet, if only one of both ICCs is biased, this means thatgroups differ in their response to one of both dichotomizedresponse options but not the other, the interpretation of which isless straightforward.6

Method

Sample

Data were available from 1,321 bereaved individuals. They wereoriginally recruited for a research program on cognitive variablesin CG (see Boelen & Lensvelt-Mulders, 2005). Participants wererecruited through different pathways. A first group was recruitedthrough grief-counselors, therapists, clergy, and other people thatmet bereaved individuals through their work-related or voluntaryactivities. They handed out 1,128 questionnaire packets to mour-ners, 492 (43.6%) of which were returned. The other participantswere recruited through an advertisement on an Internet-site withinformation about grief. Visitors of the Internet-site who had losta loved one were invited to participate by completing question-naires. People could choose to fill in questionnaires through theInternet or could send an e-mail with the request to have these sentto their homes. Six hundred mourners reliably filled in question-naires online and 490 mourners had questionnaires sent to theirhomes, of which 260 (52%) were returned. Questionnaires differedslightly across the three groups but all included the ICG–R.

6The OPLM presented in (1) is comparable to the Generalized Partial Credit Model(GPCM; Muraki, 1992). The main difference is that in the OPLM as are discrete rather thancontinuous parameters. That is, they can only attain the values 0, 1, 2, . . . . Verhelst and Glas(1995) show that this does not affect the ability of the OPLM to handle variation in item dis-criminations. An advantage of this feature of the OPLM is that the distribution of the personparameters does not have to be specified in the OPLM because the item severities can beestimated by CML. In standard applications of the GPCM this distribution is assumed tobe standard normal. However, as noted above, CML is only one part of an iterative proce-dure in which the other part consists of updating the discrimination parameters. Further-more, as shown by Glas and Verhelst (1995) and described above, a general test for DIFwith known distribution can be obtained. Note that DIF can also be tested based on theGPCM (Du Toit, 2003; Thissen, Steinberg, & Wainer, 1993). In the GPCM prior informa-tion about the discrimination parameters and thresholds is needed in order to be able toobtain estimates (Du Toit, 2003).

IRT Analysis of a Measure of Complicated Grief 111

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

All mourners who participated in the research-program wereincluded in the present study except those younger than 18 yearsof age (n¼ 31).

We examined if participants recruited from caretakers and thetwo groups enrolled via the Internet differed on variables assessed.Groups varied in kinship with participants recruited from care-takers including more spousally bereaved individuals and bothgroups recruited via the Internet including more adult bereavedchildren, v2(6, N¼ 1321)¼ 221.95, p< .001. In addition, comparedwith both groups enrolled via the Internet, those recruited fromcaretakers were older, F(2, 1320)¼ 201.39, and with less years ofeducation, F(2, 1316)¼ 18.35, ps< 0.001. There were no differ-ences among the three groups in gender, cause of loss, time fromloss, and ICG–R scores. We also compared between the twogroups enrolled via the Internet (i.e., those who completed ques-tionnaires online and those who completed the mailed version)but found no differences on any of the background variables orICG–R score (all ps> 0.16). Altogether it was deemed acceptableto combine the three groups.

As is motivated below, analyses were first conducted with arandomly selected two-thirds of the sample (‘‘exploratory sample,’’n¼ 880) and cross-validated using the remaining participants(‘‘cross-validation sample,’’ n¼ 441). Background and loss-relatedcharacteristics of both samples are shown in Table 1. Participantsin both samples did not differ from each other on backgroundvariables and ICG–R scores.

Measures

The ICG–R is a 30-item self-report questionnaire developed byPrigerson and Jacobs (2001). It is an extended version of theInventory of Complicated Grief (ICG) developed by Prigersonet al. (1995). It was designed to assess proposed criteria for CG(Prigerson et al., 1999, 2008) as well as other potentially proble-matic grief reactions. The Dutch version used in the present studydiffers slightly from the original version, in that two items of theoriginal version (items 26, representing feelings of unsafety, and27, representing lessened sense of control) were combined intoone item. Boelen et al. (2003) examined the psychometric proper-ties of the Dutch ICG–R. Among other things, they found the

112 P. A. Boelen and H. Hoijtink

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

ICG–R to have adequate internal consistency and test-retest relia-bility. Moreover, in support of its construct validity, they found themeasure to distinguish between mourners who did and did notmeet criteria for CG defined according to earlier criteria (Prigersonet al., 1999) and to be more strongly related to mental healthimpairments than the Texas Revised Inventory of Grief (TRIG;Faschingbauer, Zisook, & DeVaul, 1987).

Respondents are asked to rate the degree to which items werepresent in the past month on 5-point scales (ranging from ‘‘never’’to‘‘always’’). Although all participants completed the ICG–R in thisoriginal 5-point scale format, for reasons set out earlier, we collapsedresponse categories into a 3-point scale format in the main analysesof this study, with 0 (never), 1 (seldom=rarely), 2 (frequently=always).

Data Analysis

As OPLM assumes that the item responses are explained by aperson’s location on a single latent dimension, we first examined

TABLE 1 Background and Loss Characteristics of the Exploratory Sample andthe Cross-Validation Sample

Exploratorysample

Cross-validationsample

(n¼ 880) (n¼ 441)

Background characteristicsGender (n (%))

Men 160 (18.2) 77 (17.5)Women 720 (81.8) 364 (82.5)Age (years) (M (SD)) 43.08 (14.34) 43.32 (14.18)Education (years) (M (SD)) 15.25 (3.20) 15.00 (3.16)

Loss characteristicsDeceased is (n (%))

Partner 383 (43.5) 190 (43.1)Child 118 (13.4) 69 (15.6)Parent 248 (28.2) 123 (27.9)Other 128 (14.5) 59 (13.4)

Cause of death is (n (%))Non-violent (e.g., illness) 717 (81.5) 358 (81.2)Violent (accident, suicide, homicide) 163 (18.5) 83 (18.8)Time from loss in months (M (SD)) 33.02 (42.25) 31.34 (41.51)

IRT Analysis of a Measure of Complicated Grief 113

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

the dimensionality of the ICG–R items. We used the exploratoryapproach implemented in Mplus (Muthen & Muthen, 1998) todo so. In the second step of our analyses, the polytomous OPLMwas used to analyze the trichotomously scaled responses of theexploratory sample of 880 mourners. Discriminations and thresh-old parameters were examined to evaluate how individual CGitems are related to the dimension of overall CG severity. Third,DIF analyses were conducted. We consecutively tested for DIFbetween men (n¼ 160) and women (n¼ 720) and victims of violentloss (n¼ 163) and non-violent loss (n¼ 717).

A nice feature of IRT is that, because person parameters areon the same scale as the item severities, it is possible to describesymptomatic characteristics of relevant h levels, by looking at itemsthat have corresponding bs (Hoijtink & Vollema, 2003). As afourth step in our analyses, we estimated the h level that corres-ponds to a relatively frequently used diagnostic cut-off point andsought to describe the symptomatic characteristics of this point,by linking it with item severities. In addition, the TIC for all itemsincluded in the ICG–R was constructed to examine the testinformation provided by the ICG–R.

The analyses executed in steps 1, 2, and 3 are exploratory.A disadvantage of exploratory analyses is that the outcomesrepresent properties of the dataset at hand and are not necessarilyapplicable to the population from which the data are sampled.We used two measures to reduce the risk of drawing false conclu-sion. First, in our analyses of DIF—in which 58 v2 tests pertainingto the contrasts of both response option dichotomies of each ofthe 29 ICG–R items were conducted—we controlled for alphainflation using the false discovery rate procedure developed byBenjamini and Hochberg (1995). Second, we conducted cross-validation analyses. We randomly split the sample of 1,321mourners into an ‘‘exploratory sample’’ of 880 persons and a‘‘cross-validation’’ sample of 441 persons. In step 5 of our ana-lyses, results obtained with the former sample were evaluatedusing the latter sample. More specifically, we re-examined thedimensionality of the ICG–R, used M2 tests to examine how wellthe as found in the exploratory sample performed in the cross-validation sample, and investigated if DIF emerged if itemparameter estimates obtained from the exploratory sample wereapplied to the cross-validation sample.

114 P. A. Boelen and H. Hoijtink

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

As a final step in our analyses, we examined whether or notoutcomes of our main analyses were influenced by collapsing the5-point scale into a 3-point scale format. To this end, the OPLMwas applied to the data of the exploratory sample using theiroriginal responses scored on 5-point scales.

Results

Dimensionality of the ICG–R

The factor analysis resulted in the emergence of 5 factors witheigenvalues greater than 1.00 (i.e., 13.10, 1.95, 1.60, 1.31, and 1.12).Several outcomes pointed at the presence of one dominant factor.First, the first factor explained 45% of the variance and the secondthrough fifth factor added very little explained variance. Inher-ently, the scree-plot showed a break after the steep slope of the firstfactor and a gradual trailing of the remaining factors. Thirdly, asshown in Table 2, all 29 items had factor loadings of >0.30. Alto-gether, it was considered reasonable to conclude that the ICG–Rwas undergirded by one common factor.

Examination of the Discrimination Indices and Threshold Parameters

After a few iterations of CML estimations and adjustments of asusing the M2 test, all as had M2 values in the interval –3 to þ3,with the exception of the a of item 20 (M2¼ –3.93). Increasingthe value of this a did not improve the M2 value, so the value 1was retained.7 Discriminations and threshold parameters of theICG–R items are shown in Table 2. As can be seen, items 17(‘‘numbness’’), 21 (‘‘purposelessness’’), and 23 (‘‘unable to imaginelife is fulfilling’’) had the highest as (a¼ 6). These items thus discri-minate best between people with different positions on the dimen-sion of overall CG severity. Items that were found to be leastdiscriminating (a¼ 1) are items 12 (‘‘identification’’), 13 (‘‘avoidance’’),15 (‘‘hearing the deceased’’), 16 (‘‘seeing the deceased’’), and

7We also evaluated the statistical justification for estimating the more restrictive poly-tomous Rasch model with a¼ 1 for all items. However, M2 tests revealed that this model didnot fit the data. This indicates that the 29 items of the ICG–R items do not perform equallywell as indicators of CG severity.

IRT Analysis of a Measure of Complicated Grief 115

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

20 (‘‘feeling envious’’). Apparently, these symptoms are weakerindicators of overall CG severity.

As noted earlier, in the OPLM, there is no single indicator ofthe item’s severity. Rather, to compare among the severities ofitems, one should separately look at b1 (the point where the CRCs

TABLE 2 Item Factor Loadings and Parameter Estimates of the Items of theInventory of Complicated Grief—Revised (ICG–R) Obtained from the OneParameter Logistic Model (OPLM)

Parameters of the OPLM

Item Factor-loadings a b1 b2

1. Death feels overwhelming or devastating 0.77 5 �0.74 0.012. Preoccupation to the point of distraction 0.79 5 �0.71 0.333. Upsetting memories of the deceased 0.59 3 �0.99 0.584. Trouble accepting the death 0.62 3 �0.62 0.225. Longing and yearning for the deceased 0.64 3 �1.55 �0.316. Drawn to things associated with deceased 0.46 2 �1.14 0.287. Anger about the death 0.57 2 �0.56 0.408. Disbelief over the death 0.68 3 �0.50 0.169. Stunned, dazed or shocked 0.83 5 �0.52 0.16

10. Hard to trust people 0.64 3 �0.19 0.5811. Lost the ability to care about, or feel distant

from people0.64 3 �0.21 0.54

12. Having pain and symptoms as thedeceased

0.36 1 1.02 1.65

13. Avoiding reminders of the deceased 0.35 1 0.68 2.1414. Life is empty or meaningless 0.81 5 �0.62 0.1915. Hearing the voice of the deceased 0.36 1 �0.16 1.3416. Seeing the deceased 0.32 1 �0.43 1.1117. Feeling numb 0.83 6 �0.42 0.3618. Feeling it is unfair living while loved one is

dead0.64 3 0.04 0.59

19. Bitterness 0.69 3 �0.49 0.2220. Feeling envious of non-bereaved others 0.44 1 �0.33 0.7021. Future holds no meaning or purpose 0.88 6 �0.30 0.3722. Feeling lonely 0.79 5 �0.72 0.0123. Unable to imagine life being fulfilling 0.88 6 �0.31 0.3524. Part of oneself died with the deceased 0.73 4 �0.70 0.0425. Shattered world view 0.81 5 �0.33 0.2626. Lost sense of security, safety, or control 0.76 4 �0.51 0.1927. On edge, jumpy, or easily startled 0.74 4 �0.63 0.1828. Experiencing impairments in functioning 0.73 4 �0.53 0.3329. Difficulties with sleeping 0.52 2 �0.77 0.27

116 P. A. Boelen and H. Hoijtink

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

for response options 0 and 1 cross) and b2 (the point where theCRCs for response options 1 and 2 cross). When comparingamong the thresholds of individual ICG–R items (Table 2), wecan see that there are several items for which b1 and b2 have asimilar position on the dimension of CG severity in comparisonwith the b1s and b2s of the other items. These items can be com-pared fairly straightforwardly with respect to their overall severity.For example, item 5 (‘‘yearning’’) is the least severe of all items,because the first and second threshold parameter of this item arethe lowest of all the items. Items 1, 22, and 24 are relatively easyas well. Stated otherwise, with low levels of CG, people mayalready have these symptoms. Items 10, 11, 12, 13, 15, and 18are among the most severe items when looking at their bs. Rela-tively high levels of CG severity are required for these items tobe answered with 1 or 2. As we referred to earlier, the CRCs forthe least severe item (item 5, ‘‘yearning’’) and one of the mostsevere items (item 13, ‘‘avoidance’’) are depicted in Figure 2. Item13 occurs at higher levels of CG and is therefore a more severeitem than item 5.

Table 2 shows that there are several items that, compared withthe other items, have a relatively low b1 and a relatively high b2.Compare, for instance, the thresholds of item 1 (‘‘death is over-whelming’’) with those of item 2 (‘‘preoccupation’’). The firstthresholds of both items are near each other, indicating that equallevels of CG severity coincide with both items being endorsed witha 1-response. Yet, the second thresholds are further removed fromeach other. For item 2, a larger increase in CG severity is requiredbefore mourners give a 2-response instead of a 1-response than foritem 1. Stated otherwise, for item 2 ‘‘never’’ and ‘‘seldom=rarely’’responses occur at relatively low levels of CG whereas a ‘‘frequently=always’’ response occurs at relatively high levels of CG. The sameholds for items 3, 6, and 29. There are still other items that have rela-tively high b1 and relatively low b2. Compare, for instance, thethresholds of item 8 (‘‘disbelief’’) with those of item 27 (‘‘on edge’’).The second thresholds are near each other, indicating that equallevels of CG are required to answer both items with a 2-response.Yet, item 8 has a higher first threshold. This indicates that item 8 isrelatively difficult to endorse with a low response but relatively easyto endorse with a high response. The same is true for items 9, 19,26, and 25. We could also say that these items are endorsed with

IRT Analysis of a Measure of Complicated Grief 117

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

a 1-response at relatively high levels of CG, whereas only a slightincrease in CG severity coincides with people choosing a 2-responseinstead of a 1-response.

Examination of DIF

First, we examined DIF between men and women. Examination ofthe 58 v2 tests (pertaining to the contrasts of both response optiondichotomies of each item) provided no clear evidence of DIF.Eleven v2 tests were significant at p< .05. One of these remainedsignificant when controlling the false discovery rate (Benjamini &Hochberg, 1995). This v2 test pertained to the contrast of the sec-ond response option dichotomy of item 10 (‘‘hard to trust people’’),v2¼ 30.81, df ¼ 7, p< .0001. For illustrative purposes, Figure 3shows the observed ICCs of men and women for the first (upperpanel) and second (lower panel) response option dichotomy of thisitem. The upper panel shows that, for any level of CG severity,men and women have equal probabilities of picking responseoptions 1 or 2 (and not 0); women seem to systematically score abit higher than men but this difference is not significant. The lowerpanel shows that women have a slightly higher chance of picking a2-response (and not 0 or 1). Yet, although this difference is statisti-cally significant, the difference between the curves is too small tohave practical relevance. The lower panel shows that the differencein probability of endorsement is no more than approximately 15%at the point where their ‘‘ICCs’’ differ most. More importantly,notable differences only occur at a small part of the CG severitydimension.8

8There is no particular ‘‘rule of thumb’’ for judging the practical relevance of statisti-cally significant DIF within the OPLM. Nevertheless, there are several arguments for sayingthat this statistically significant gender DIF has little relevance. A first argument is that,because it is only in a very small range of the h dimension that women have a higher chancethan men to answer this item with a 2 response, it likely only affects the responses of rela-tively few mourners (i.e., those scoring within that h range). A second argument is that, giventhat the difference in probability of endorsement is only about 15%, most women willendorse this items similar to men, even in this small range of the h dimension. A third argu-ment is that only one of the two response option dichotomies contains DIF such that, acrossthe entire h range, men and women have equal probabilities of picking the lower responsesto this item. Notice also that this DIF likely has little to no influence on the (weighted andunweighted) total scores of the ICG–R.

118 P. A. Boelen and H. Hoijtink

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

In the second analysis, we examined DIF between victims ofviolent vs. non-violent loss. Eleven of all 58 v2 tests were significantat p< .05. Again, only one of these remained significant after con-trolling the false discovery rate. This v2 test pertained to the contrastof the second response option dichotomy of item 20 (‘‘feelingenvious’’), v2¼ 36.40, df ¼ 12, p< .0001. Yet, similar to the genderDIF for item 10, this DIF did not seem theoretically relevant: Again,differences in probability of endorsement were no larger than 15%and present at small parts of the CG dimension.

Summary of Analyses

Figure 4 summarizes results of the present OPLM analyses. Thedimension of overall CG severity (h) is indicated by the horizontalaxis and ranges from –1.7 to 2.3. The figure helps us to comparebetween the thresholds of items and to characterize personparameters corresponding to important diagnostic cut-off points.

FIGURE 4 Distribution of estimated CG severity scores in the exploratorysample (n¼ 880), test information provided by the Inventory of ComplicatedGrief—Revised (ICG–R), and examples of items ordered down the CG severitydimension according the their first and second threshold parameters.

IRT Analysis of a Measure of Complicated Grief 119

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

Moreover, it gives insight into the amount of test informationprovided by the ICG–R as a whole.

To visualize the relative severity of items, several items havebeen ordered on the h dimension on the basis of their twothreshold parameters. (Recall that b and h are on the same scale.)The numbers 1 and 2 added to the items refer to b1 and b2. Againconsider the least severe item (item 5, ‘‘yearning’’) and one of themost severe items (item 13, ‘‘avoidance’’). The bs of item 5 havevalues of –1.55 and –0.31. This indicates that mourners with hlevels below –1.55 tend to answer 0 to this item, whereas mournerswith h-levels between –1.55 and –0.31 tend to answer 1, and thosewith h-levels over –0.31 tend to answer 2. That both bs are downthe left side of the CG severity dimension indicates that relativelylow CG levels already coincide with feelings of longing and yearn-ing being present frequently or always. The opposite is true foritem 13 with bs of 0.68 and 2.14. This item is endorsed with a1-response or 2-response at relatively high levels of CG severity.Among the items that are more in the mid-range of the severitydimension are items 4, 14, and 28.

Figure 4 graphically portrays a point made earlier, that it isdifficult to compare the overall severity of items. Instead, oneshould compare their two bs to get an indication of their relativeseverity. Consider, for example, items 3 (‘‘upsetting memories’’)and 22 (‘‘feeling lonely’’). A relatively low level of CG severity isrequired for item 3 to be endorsed with a 1-response and a slightlyhigher severity level is necessary to give a similar response toitem 22. Notice that different increases in CG severity are requiredbefore people pick a 2-response instead of a 1-response for eachitem. That is, whereas only a slight increase in CG severity coin-cides with an increased probability to answer 2 instead of 0 or 1to item 22, a much larger increase in CG severity is required beforemourners give a 2-response to item 3.

The content of items ordered along the dimension of CGseverity can be used to characterize important diagnostic cut-offpoints. In the absence of an external diagnostic gold standard toidentify true cases of CG, a score in the upper quintile of the dis-tribution of possible CG levels has often been used to distinguishmourners at risk for significant health impairments (Prigersonet al., 1999). In the current sample, the cut-off for a score in thetop 20% of the distribution of h corresponded to a h-level of

120 P. A. Boelen and H. Hoijtink

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

0.36. When comparing this cut-off with the bs reported in Table 2,it can be seen that mourners scoring above this cut-off are likely togive a 2-response (‘‘frequently=always’’) to most of the items thatare part of the current criteria for CG (Prigerson et al., 2008),including items 4 (‘‘trouble accepting’’), 5 (‘‘yearning’’), 9(‘‘stunned’’), 14 (‘‘life feels empty’’), 17 (‘‘feeling numb’’), 19(‘‘bitterness’’), 24 (‘‘part of self died’’), and 28 (‘‘symptoms causeimpairments in functioning’’). Stated otherwise, a h score in therange of CG caseness (h> 0.36) coincides with high scores on indi-vidual CG items that reflect proposed criteria for CG (Prigersonet al., 2008). This attests to the validity of these individual itemsas denoting CG.

Figure 4 also provides a means to examine the test informa-tion provided by all ICG–R items together. Above the horizontalaxis, several examples of weighted raw ICG–R total scores (thenumbers added to the horizontal lines; calculated as the summa-tion of the item scores multiplied by the a of each item), the cor-responding estimated h-scores (the short vertical lines), and the95% CIs of these scores are displayed. It can be seen that CIsthat are central to the h dimension are relatively small comparedwith CIs located towards the left and right sides of the dimension.This indicates that all 29 items together provide more reliableinformation about the h-levels of mourners that are locatedin the middle of the h dimension, than about the h-levels ofpeople located towards the dimension’s extremes. The TIC ofthe ICG–R is depicted by the solid line above the histogramin Figure 4. The TIC portrays the precision with which theICG–R is able to estimate h, throughout the entire dimensionof CG severity. Notice that the TIC has its peak, where the95% CIs around the h estimates are smallest.

The histogram above the higher h line in Figure 4 representsthe distribution of estimated h-scores of the 880 mourners in thecurrent sample. As can be seen, estimated h scores in this sampleroughly lie between –1.50 and 1.70. Most mourners are positionedsomewhere in the mid-range of the dimension. That the ICG–Ras a whole provides much information about this part of thedimension (i.e., the TIC has its peak here) indicates that the mea-sure is sensitive to discriminating between contiguous levels of CGseverity, in the population of mourners that the current samplerepresents.

IRT Analysis of a Measure of Complicated Grief 121

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

Cross-Validation Analyses

The cross-validation sample (n¼ 441) was used to re-examine thedimensionality, item parameters, and outcomes of DIF as foundin the exploratory sample. No notable differences were found.First, exploratory factor analysis revealed that one dominant factorwas present. Second, M2 tests showed that the as that were initiallyfound (Table 2) fit the data of the cross-validation sample. Third,outcomes of DIF were highly comparable to those obtained withthe main sample; none of the v2 tests pertaining to differencesbetween groups in ‘‘ICCs’’ of the response option dichotomieswere significant.

Additional Analyses with the 5-point Response Format

Outcomes of the OPLM analyses using the responses of theexploratory sample (n¼ 880) rated on 5-point scales hardly dif-fered from those obtained using the trichotomously scaledresponses. Again, examination of the discrimination parametersshowed that items 17, 21, and 23 were the best and items 12, 13,15, 16, and 20 were the weakest indicators of overall CG severity.All items now had four bs. For 18 items bs were ordered fromsmallest to largest. Thresholds of the remaining 11 items, however,were not nicely ordered. Inspection of the CRCs of these itemsindicated that, for none of the levels of h, response option 1 (‘‘sel-dom’’) was the most probable response. Notice that thresholds ofthese items were in fact ordered when responses were collapsedinto a trichotomous format. This suggests that the trichotomousscale is more appropriate for these items. Apart from the orderingof thresholds that differed when using the original 5-pointresponses, findings concerning the overall severity of items usingthe 5-point scales did not clearly differ from those obtainedusing the trichotomous scores. Again, items 5, 1, and 22 wereamong the least severe and items 12, 13, and 15 among the mostsevere of all items.

DIF analyses were more complex using the 5-point responseformat, as we now had to contrast ICCs of groups included, foreach of the four response option dichotomies of the ICG–Ritems. Thus, in total, 116 (4� 29) contrasts were examined ineach of the two sets of DIF analysis. With respect to DIF between

122 P. A. Boelen and H. Hoijtink

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

men and women, there were seven contrasts that were significantas evidenced by significant v2 tests (controlling for familywiseerror). Yet, inspection of plots showed no signs of relevantDIF. With respect to DIF between victims of violent versusnon-violent loss, there were three significant v2 tests (controllingfor family wise error) but plots of these contrasts did not pointat relevant DIF.

Discussion

Methods based in IRT modeling were applied to the responses of1,321 mourners to the ICG–R to examine the information thateach of the ICG–R items contributes to the measurement of overallCG severity and to detect possible differences between subgroupsof mourners in the way CG symptoms are endorsed. By doing so,we sought to enhance knowledge on the psychometric propertiesof the ICG–R and the validity of the proposed criteria for CG thatare tapped by the ICG–R (Prigerson et al., 1999, 2008).

As a first step in our analyses, we examined the dimensional-ity of the ICG–R using exploratory factor analysis. Outcomesshowed that the ICG–R can be conceptualized as representingone dimension of CG symptomatology. Findings were replicatedin the cross-validation sample and are in agreement with earlierstudies showing that CG symptoms form a unidimensionalconstruct (Prigerson & Jacobs, 2001).

As a next step in our analyses, discrimination indices andthreshold=severity parameters were estimated using the OPLM.Items were found to differ in their ability to distinguish betweencontiguous levels of CG. The best performing items (with a valuesof 5 and 6) were ‘‘death is overwhelming,’’ ‘‘preoccupation,’’‘‘stunned,’’ ‘‘life is empty,’’ ‘‘feeling numb,’’ ‘‘future holds no mean-ing,’’ ‘‘feeling lonely,’’ ‘‘unable to imagine life is fulfilling,’’ and‘‘shattered worldview.’’ Among the items with low discriminationindices were ‘‘feeling envious,’’ ‘‘having pain and symptoms asthe deceased,’’ and the ‘‘hearing the deceased’’ and ‘‘seeing thedeceased’’ symptoms.

A number of things can be learned from these findings. First,the findings tell us something about the validity of the symptomsincluded in the most recent criteria of CG (Prigerson et al., 2008).That is, the discrimination parameters of the symptoms included

IRT Analysis of a Measure of Complicated Grief 123

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

in these criteria (Prigerson et al., 2008)—as far as included in thepresent analyses—all had values of 3 or higher in the present ana-lyses. This supports the validity of symptoms chosen to denoteCG. However, there was one exception. That is, our findingsdo not seem to support the inclusion of ‘‘avoidance’’ as a symp-tom of CG (Prigerson et al., 2008) because it had a weak discri-minative ability (a¼ 1). Importantly though, we assessedavoidance of reminders of the deceased in general, rather avoid-ance of reminders of the reality of the loss—as it is defined in therecent criteria of CG. The latter ‘‘reality-focused’’ avoidance maywell be indicative of CG, but this warrants further examination.

Second, the findings shed light on the validity of theoreticalconceptualizations of CG. For example, it is noteworthy that itemsreferring to disruption of basic assumptions about life, the future,and the world (items 14, 21, 23, 25, and 26) that partially over-lapped with items referring to meaninglessness (items 14 and 21)discriminated between high and low CG better than did itemsrepresenting separation distress (items 5, 6, and 22). These findingscan be interpreted as indicating that CG is better conceptualized asa disorder of disrupted cognitions (Boelen, van den Hout, & vanden Bout, 2006) or meaning-making processes (Neimeyer, 2006)than as a disorder of disrupted attachment (Prigerson & Jacobs,2001). However, in this respect it is difficult to interpret the findingthat ‘‘yearning’’ was the least severe manifestation of CG. On theone hand, one could say that this indicates that ‘‘yearning’’ is per-haps not a good marker of CG. At the same time, one could arguethat the fact that ‘‘yearning’’ is already endorsed by mourners withvery low levels of CG is a good reason to include it as a mandatorysymptom of CG. At the very least, our findings suggest that it isrelevant for future studies to further examine the performance ofseparation distress symptoms as putative markers of CG.

A third noteworthy point is that the relatively weak perfor-mance of the items ‘‘having pain and symptoms as the deceased,’’‘‘hearing the deceased,’’ and ‘‘seeing the deceased’’ lends little sup-port to the presumably pathological nature of ‘‘hallucinations’’ ofthe deceased and ‘‘identification’’ with his=her symptoms, as putforth in old (Lindemann, 1944) as well as more recent literature(Burnett, Middleton, Raphael, & Matinek, 1997).

Threshold parameters showed that items differed stronglyin their position along the dimension of overall CG severity.

124 P. A. Boelen and H. Hoijtink

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

Compared with other items, the items ‘‘death is overwhelming,’’‘‘yearning,’’ ‘‘feeling that part of oneself has died,’’ and ‘‘feelinglonely’’ were found to discriminate at lower parts of the severitydimension. These items thus represented less severe manifestationsof the CG dimension. Among the items that were found to operateat the CG dimension’s high end (thus representing more severemanifestations of CG) were ‘‘hard to trust people’’ and ‘‘lost theability to care of others,’’ as well as the ‘‘avoidance,’’ ‘‘having painand symptoms as the deceased’’ and ‘‘hearing the deceased’’ items.Notice that these last three items thus not only had low discrimina-tive abilities but were also among the most severe items. We deter-mined a cut-off for a score in the top 20% of the distribution of all hscores (cf. Prigerson et al., 1995, 1999) and found that a score in therange of CG caseness coincided with high scores on most of theitems selected in the recent criteria for CG (Prigerson et al.,2008). This lends further support to the validity of these criteria.

There were some items that were easy to answer with 1 anddifficult to answer with 2 (e.g., ‘‘preoccupation,’’ ‘‘upsetting mem-ories’’) and other items that were relatively difficult to answer with1 and easy to answer with 2 (e.g., ‘‘disbelief,’’ ‘‘stunned’’). Thatitem severities varied in their position along the CG dimensionindicates that the meaning of item responses differs between items.For instance, a ‘‘seldom=rarely’’ response to different items is notnecessarily indicative of equal levels of CG. Such a response toitem 5 (‘‘yearning’’) is indicative of lower CG levels than a similarresponse to item 10 (‘‘hard to trust people’’) (Table 2). Importantly,we were able to replicate findings concerning the discriminativeability and severity of ICG–R items found with the exploratorysample (n¼ 880), with our cross-validation sample (n¼ 441).This supports the generalizability of our findings.

With respect to the psychometric properties of the ICG–R,the present findings indicate that most of its items are informativein distinguishing individual differences in CG severity and thatthey are reasonably dispersed across the severity dimension. Theseresults support the construct validity of the ICG–R. The TICindicated that the maximum precision of the ICG–R in the estima-tion of CG severity is in the centre of the severity dimension.Importantly, most mourners in the current sample had CG severitylevels in this region. This indicates that the ICG–R is sensitivein discriminating among contiguous levels of CG in a rather

IRT Analysis of a Measure of Complicated Grief 125

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

heterogeneous population of mourners that the current samplerepresents. Noteworthy also is that the fall-off in information wasfound to lie beyond the diagnostic cut-off score for caseness ofCG (i.e., h¼ 0.36). This suggests that the loss of precision in esti-mating CG severity levels down the high side of the severitydimension is not very important, as it occurs when the diagnosticdecision has been made. Nonetheless, in the developmentstowards refinement of a CG measure, authors could aim to con-struct items with very low severities and items with high severitiesto obtain more precision in measurement across the entire range ofthe CG severity dimension. Pertinent to further refinement of theICG–R too is that, for 11 items, the trichotomous response formatseemed more appropriate than the original 5-point scale responseformat. For these items (scored on 5-point scales), increases inoverall CG severity did not nicely coincide with people pickinga higher response option and the response option ‘‘seldom’’ wasnever the most probable option. Future studies should investigatefurther if a trichotomous response format is perhaps more appro-priate when measuring CG symptoms.

An important aim of the current study was to examine if thereare systematic differences between subgroups of mourners in theway ICG–R items are endorsed. Results revealed that there wasno meaningful DIF for gender nor for victims of violent versusnon-violent loss. These findings indicate that the ICG–R is a validtool to study differences in CG symptoms across groups divided bygender and cause of loss. Moreover, the absence of DIF supportsthe construct validity of the proposed criteria for CG (Prigersonet al., 2008) tapped by the ICG–R, taking into account that validdiagnostic criteria would not be expected to perform differentlyin these groups.

There are several limitations that should be taken into accountin the interpretation of our findings. First, as this study was donewith the Dutch ICG–R, strictly speaking it is uncertain to whatextent current findings apply to the English version of the ICG–R.Second, we did not assess the symptoms ‘‘pangs of grief’’ and ‘‘dif-ficulties moving on with life’’ that are among the criteria of themost recent description of CG (Prigerson et al., 1999) becausethese are not included in the Dutch ICG–R. Thus, future studiesare needed to further examine the psychometric properties of theseparticular symptoms. Third, respondents enrolled via Internet

126 P. A. Boelen and H. Hoijtink

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

were self-selected. Thus, generalizability to non-assessed groupsshould occur cautiously. Fourth, in characterizing a diagnosticcut-off for caseness of CG, caseness was defined in the absenceof an external diagnostic ‘gold standard.’ Research that includessuch as a standard is needed.

Notwithstanding these considerations, the present findingsadd to our knowledge about the performance of individual CGsymptoms as indicators of the underlying dimension of overallCG severity and about which particular symptoms represent moreand less severe manifestations of this dimension. In addition, thefindings enhance knowledge about the psychometric propertiesof the ICG–R and the validity of the proposed criteria for CG(Prigerson et al., 2008). The findings may have clinical implica-tions. As noted, we found that negative cognitions about life andthe future and difficulties with finding meaning in the loss stronglydistinguish mourners with high levels of CG from those with lowerlevels of CG. Thus, targeting these processes seems importantin the treatment of CG. Moreover, examining these and otherstrongly discriminating items (e.g., preoccupation with thedeceased, numbness) seems important when screening mournerswho are potentially at risk for CG. Because of the theoreticaland applied relevance of IRT modeling, it would be useful forfuture studies to continue to apply this approach to research onthe measurement characteristics of CG symptoms.

References

Benjamini, Y. & Hochberg, Y. (1995). Controlling the false discovery rate: Apractical and powerful approach to multiple testing. Journal of the RoyalStatistical Society, Series B, 57, 289–300.

Boelen, P. A., van den Bout, J., de Keijser, J., & Hoijtink, H. (2003). Reliabilityand validity of the Dutch version of the Inventory of Traumatic Grief. DeathStudies, 27, 227–247.

Boelen, P. A. & Lensvelt-Mulders, G. J. L. M. (2005). Psychometric propertiesof the Grief Cognitions Questionnaire (GCQ). Journal of Psychopathology andBehavioral Assessment, 27, 291–303.

Boelen, P. A., van den Hout, M. A., & van den Bout, J. (2006). A cognitive-behavioral conceptualization of complicated grief. Clinical Psychology: Scienceand Practice, 13, 109–128.

Boelen, P. A. & Prigerson, H. G. (2007). The influence of symptoms of prolongedgrief disorder, depression, and anxiety on quality of life among bereaved

IRT Analysis of a Measure of Complicated Grief 127

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

adults. A prospective study. European Archives of Psychiatry and ClinicalNeuroscience, 257, 444–452.

Bonanno, G. A. & Kaltman, S. (2001). The varieties of grief experience. ClinicalPsychology Review, 21, 705–734.

Burnett, P., Middleton, W., Raphael, B., & Martinek, N. (1997). Measuring corebereavement phenomena. Psychological Medicine, 27, 49–57.

Chen, J. H., Bierhals, A. J., Prigerson, H. G., Kasl, S. V., Mazure, C. M., & Jacobs, S.(1999). Gender differences in the effects of bereavement-related psychologicaldistress in health outcomes. Psychological Medicine, 29, 367–380.

Du Toit, M. (Ed.) (2003). IRT from SSI: Bilog-MG, Multilog, Parscale, Testfact.Lincolnwood, IL: Scientific Software International.

Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists.New York: Lawrence Erlbaum Associates.

Faschingbauer, T. R., Zisook, S., & DeVaul, R. (1987). The Texas RevisedInventory of Grief. In S. Zisook (Ed.), Biopsychosocial aspects of bereavement(pp. 127–138). Washington, DC: American Psychiatric Press.

Glas, C. A. W. & Verhelst, N. D. (1995). Testing the Rasch model. In G. H. Fisher& I. W. Molenaar (Eds.), Rasch models: Foundations, recent developments, andapplications (pp. 69–96). New York: Springer Verlag.

Hambleton, R., Swaminathan, H., & Rogers, H. (1991). Fundamentals of itemresponse theory. Newbury Park, CA: Sage.

Hoijtink, H. & Vollema, M. (2003). Contemporary extensions of the Raschmodel. Quality & Quantity, 37, 263–276.

Holland, P. W. & Wainer, H. (Eds.) (1993). Differential item functioning. Hillsdale,NJ: Lawrence Erlbaum Associates.

Lichtenthal, W. G., Cruess, D. G., & Prigerson, H. G. (2004). A case for establish-ing complicated grief as a distinct mental disorder in DSM-V. ClinicalPsychology Review, 24, 637–662.

Lindemann, E. (1944). Symptomatology and management of acute grief. AmericanJournal of Psychiatry, 101, 141–148.

Muraki, E. (1992). A generalized partial credit model: Application of anEM algorithm. Applied Psychological Measurement, 16, 159–176.

Muthen, L. K. & Muthen, B. O. (1998). Mplus. The comprehensive modeling programfor applied researchers.: User’s guide. Los Angeles: Authors.

Neimeyer, R. A. (2006). Complicated grief and the reconstruction of meaning:Conceptual and empirical contributions to a cognitive-constructivist model.Clinical Psychology: Science and Practice, 13, 141–145.

Neimeyer, R. A., Hogan, N., & Laurie, A. (2008). The measurement of grief:Psychometric considerations in the assessment of reactions to bereavement.In M. Stroebe, R. O. Hansson, H. Schut, & W. Stroebe (Eds.). Handbookof bereavement research: 21st century perspectives (pp. 133–162). Washington,DC: American Psychological Association.

Prigerson, H. G., Bierhals, A. J., Kasl, S. V., Reynolds, C. F., Shear, M. K.,Day, N., Beery, L. C., Newsom, J. T., & Jacobs, S. C. (1997). Traumaticgrief as a risk factor for mental and physical morbidity. American Journal ofPsychiatry, 154, 616–623.

128 P. A. Boelen and H. Hoijtink

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4

Prigerson, H. G., Bierhals, A. J., Kasl, S. V., Reynolds, C. F., Shear, M. K.,Newsom, J. T., & Jacobs, S. C. (1996). Complicated grief as a disorderdistinct from bereavement-related depression and anxiety: A replicationstudy. American Journal of Psychiatry, 153, 1484–1486.

Prigerson, H. G. & Jacobs, S. C. (2001). Traumatic grief as a distinct disorder:A rationale, consensus criteria, and a preliminary empirical test. InM. S. Stroebe, R. O. Hansson, W. Stroebe, & H. A. W. Schut (Eds.), Hand-book of bereavement research. Consequences, coping, and care (pp. 613–647).Washington, DC: American Psychological Association Press.

Prigerson, H. G., Maciejewski, P. K., Reynolds, C. F., Bierhals, A. J., Newsom,J. T., Fasiczka, A., Frank, E., Doman, J., & Miller, M. (1995). Inventory ofComplicated Grief: A scale to measure maladaptive symptoms of loss.Psychiatry Research, 59, 65–79.

Prigerson, H. G., Shear, M. K., Jacobs, S. C., Reynolds, C. F., Maciejewski, P. K.,Davidson, J., Rosenheck, R., Pilkonis, P. A., Wortman, C. B., Williams, J. W.B., Widiger, T. A., Frank, E., Kupfer, D. J., & Zisook, S. (1999). Consensuscriteria for traumatic grief. British Journal of Psychiatry, 174, 67–73.

Prigerson, H. G., Vanderwerker, L. C., & Maciejewski, P. K. (2008). Prolongedgrief disorder as a mental disorder: Inclusion in DSM. In M. Stroebe, R.Hansson, W. Stroebe, & H. Schut (Eds.), Handbook of bereavement researchand practice: 21st century perspectives (pp. 165–186). Washington, DC: Ameri-can Psychological Association Press.

Silverman, G. K., Jacobs, S. C., Kasl, S. V., Shear, M. K., Maciejewski, P. K.,Noaghiul, F. S., & Prigerson, H. G. (2000). Quality of life impairmentsassociated with diagnostic criteria for traumatic grief. Psychological Medicine,30, 857–862.

Stroebe, M., Schut, H., & Finkenhauer, C. (2001). The traumatization of grief? Aconceptual framework for understanding the trauma-bereavement interface.Israel Journal of Psychiatry and Related Sciences, 38, 185–201.

Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential itemfunctioning using the parameters of item response models. In P. W. Holland& H. Wainer (Eds.), Differential item functioning (pp. 67–113). Hillsdale,NJ: Lawrence Erlbaum.

Verhelst, N. D. & Glas, C. A. W. (1995). The One Parameter Logistic Model.In G. H. Fisher & I. W. Molenaar (Eds.), Rasch models: Foundations, recentdevelopments, and applications (pp. 215–238). New York: Springer-Verlag.

Verhelst, N. D., Glas, C. A. W., & Verstralen, H. H. F. M. (1995). OPLM:One Parameter Logistic Model. Computer program and manual. Arnhem,The Netherlands: Cito.

IRT Analysis of a Measure of Complicated Grief 129

Dow

nloa

ded

by [

Uni

vers

ity o

f C

onne

ctic

ut]

at 0

0:51

10

Oct

ober

201

4