what is the general health questionnaire-12 assessing?

8
What is the General Health Questionnaire-12 assessing? Dimensionality and psychometric properties of the General Health Questionnaire-12 in a large scale German population sample Matthias Romppel a,b , Elmar Braehler a , Marcus Roth c , Heide Glaesmer a, a Department of Medical Psychology and Medical Sociology, University of Leipzig, 04103 Leipzig/Germany b Department of Applied Social Sciences, Leipzig, University of Applied Sciences, 04251 Leipzig/Germany c Department of Psychology, University of Duisburg-Essen, 45127 Essen, Germany Abstract Since the dimensionality and the related psychometric properties of the 12-item General Health Questionnaire (GHQ-12) are still under debate, the present study compares different factor solutions from the literature to determine which shows the best fit and to investigate reliability and construct validity. The analyses are based on a German population based representative sample (N = 2,041), using face-to-face- interviews. The confirmatory factor analysis indicates the best fit to the one-factor model including response bias on the negative worded items according to Hankins. Thus, the importance of methodical aspects for the dimensionality was emphasized. Moreover, the correlations of the different subscales of the two- and three-factor models with several external criteria (BDI, PHQ-2, SF-36, PHQ-Anxiety, SPIN) do not substantially differ. The preferred unidimensional model shows good psychometric properties. According to its associations with the external criteria under study, the GHQ-12 as a unidimensional measure seems to be a useful screening tool for the assessment of mental distress or a minor psychiatric morbidity with a main focus on depressive symptomatology. © 2013 Elsevier Inc. All rights reserved. 1. Background Since its development in 1972, the General Health Questionnaire (GHQ) has been widely used as a screening instrument for minor psychiatric morbidity [1]. It has been translated in many languages and extensively validated in different populations [2]. Originating from the 60-item GHQ, shorter versions have been developed and validated, like the 12-item version (GHQ-12). The GHQ-12 has recently become the most popular form of the scale because of its relatively good psychometric properties and its brevity. Although the GHQ-12 was designed as a unidimensional scale, several studies revealed two- or three-factor solutions. Most of the studies interpreted a factor of anxiety/ depressionand a factor of social dysfunction[2-9]. Beyond these two factors, some studies revealed a third factor expressing loss of confidence[10-14]. Over time various names had been provided for these factors [15]. Martin (1999) developed a correlated three-factor model, in which the factors were labelled cope, stress, and depression; this has unfortunately not been replicated in other studies so far [11,12,15]. It has repeatedly been stated that the GHQ-12 measures psychiatric morbidity in more than one domain. A three-factor solution is usually considered to be superior [14]. Several studies discussed the impact of methodical aspects on the factorial solutions revealed in the different studies. As there are different scoring methods available for the GHQ-12, a study by Campbell has shown that the scoring method substantially affects the model estimation [16]. Especially the importance of positive and negative worded items is under debate [17,18]. This so called wording effectmeans that the positive or negative formulation of Available online at www.sciencedirect.com Comprehensive Psychiatry 54 (2013) 406 413 www.elsevier.com/locate/comppsych Corresponding author. Department of Medical Psychology and Medical Sociology, University of Leipzig, Philipp-Rosenthal-Str. 55, 04103 Leipzig/Germany. Tel.: +49 341 9718811; fax. +49 341 9718809. E-mail address: [email protected] (H. Glaesmer). 0010-440X/$ see front matter © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.comppsych.2012.10.010

Upload: heide

Post on 31-Dec-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Available online at www.sciencedirect.com

Comprehensive Psychiatry 54 (2013) 406–413www.elsevier.com/locate/comppsych

What is the General Health Questionnaire-12 assessing?

Dimensionality and psychometric properties of the General HealthQuestionnaire-12 in a large scale German population sample

Matthias Romppela,b, Elmar Braehlera, Marcus Rothc, Heide Glaesmera,⁎aDepartment of Medical Psychology and Medical Sociology, University of Leipzig, 04103 Leipzig/GermanybDepartment of Applied Social Sciences, Leipzig, University of Applied Sciences, 04251 Leipzig/Germany

cDepartment of Psychology, University of Duisburg-Essen, 45127 Essen, Germany

Abstract

Since the dimensionality and the related psychometric properties of the 12-item General Health Questionnaire (GHQ-12) are still underdebate, the present study compares different factor solutions from the literature to determine which shows the best fit and to investigatereliability and construct validity. The analyses are based on a German population based representative sample (N=2,041), using face-to-face-interviews. The confirmatory factor analysis indicates the best fit to the one-factor model including response bias on the negative wordeditems according to Hankins. Thus, the importance of methodical aspects for the dimensionality was emphasized. Moreover, the correlationsof the different subscales of the two- and three-factor models with several external criteria (BDI, PHQ-2, SF-36, PHQ-Anxiety, SPIN) do notsubstantially differ. The preferred unidimensional model shows good psychometric properties. According to its associations with the externalcriteria under study, the GHQ-12 as a unidimensional measure seems to be a useful screening tool for the assessment of mental distress or aminor psychiatric morbidity with a main focus on depressive symptomatology.© 2013 Elsevier Inc. All rights reserved.

1. Background

Since its development in 1972, the General HealthQuestionnaire (GHQ) has been widely used as a screeninginstrument for minor psychiatric morbidity [1]. It has beentranslated in many languages and extensively validated indifferent populations [2]. Originating from the 60-itemGHQ, shorter versions have been developed and validated,like the 12-item version (GHQ-12). The GHQ-12 hasrecently become the most popular form of the scale becauseof its relatively good psychometric properties and its brevity.Although the GHQ-12 was designed as a unidimensionalscale, several studies revealed two- or three-factor solutions.

⁎ Corresponding author. Department of Medical Psychology andMedical Sociology, University of Leipzig, Philipp-Rosenthal-Str. 55,04103 Leipzig/Germany. Tel.: +49 341 9718811; fax. +49 341 9718809.

E-mail address: [email protected] (H. Glaesmer).

0010-440X/$ – see front matter © 2013 Elsevier Inc. All rights reserved.http://dx.doi.org/10.1016/j.comppsych.2012.10.010

Most of the studies interpreted a factor of “anxiety/depression” and a factor of “social dysfunction” [2-9].Beyond these two factors, some studies revealed a thirdfactor expressing “loss of confidence” [10-14]. Over timevarious names had been provided for these factors [15].Martin (1999) developed a correlated three-factor model, inwhich the factors were labelled “cope”, “stress”, and“depression”; this has unfortunately not been replicated inother studies so far [11,12,15]. It has repeatedly been statedthat the GHQ-12 measures psychiatric morbidity in morethan one domain. A three-factor solution is usuallyconsidered to be superior [14].

Several studies discussed the impact of methodicalaspects on the factorial solutions revealed in the differentstudies. As there are different scoring methods available forthe GHQ-12, a study by Campbell has shown that the scoringmethod substantially affects the model estimation [16].Especially the importance of positive and negative wordeditems is under debate [17,18]. This so called “wordingeffect” means that the positive or negative formulation of

407M. Romppel et al. / Comprehensive Psychiatry 54 (2013) 406–413

items influences the interpretation of these items and theresponse of the participants. In its consequence it has aneffect on the factorial structure. This effect of the wordingon the factorial structure has been confirmed in differentstudies on the dimensionality of other psychometricinstruments like the Rosenberg Self-Esteem Scale and theLife-Orientation-Test [19-21]. Marsh [19] preferred aunidimensional solution with response bias on the nega-tively phrased items for the Rosenberg Self-esteem scale[19]. Up to now studies have assumed that the GHQ-12 isfree of response bias. Following the finding of Marsh [19],Hankins tested a unidimensional model including responsebias for the negatively phrased items for the GHQ-12 incomparison with a simple unidimensional model and athree-dimensional model [17]. This unidimensional modelincluding response bias showed superior fit compared to thetwo other models under study. Hankins suggests that thisfinding underpins that the previous findings for the factorialstructure of the GHQ-12 were based on methodical artefacts[22]. A study by Ye [18] replicated the results of Hankins[17,22] in a Chinese sample. The study of Wang and Lin[23] showed that the GHQ-12 has a unidimensionalstructure after controlling for wording effects.

In summary, the factorial structure of the GHQ-12remains under debate [15,18,22], and thus, a solid statementabout the reliability of the GHQ-12 is still lacking, because itdepends on the factorial structure. Nevertheless, Hankins[17] reported the impact of dimensionality and response biason reliability. As the calculation of reliability by alpha isbased on the assumption of no response bias and the bestfitting model indicates response bias, the alpha may be over-estimated [17]. Finally, a comprehensive psychometricevaluation needs statements about the validity of theinstrument. A useful procedure is the inclusion of externalcriteria to clarify what the GHQ-12 assesses. Most of thepsychometric studies lack this final step, and the designa-tion of the different dimensions is based on a review of therelated items.

The two- and three-factor-models distinguish differentdomains like “depression” and “loss of confidence”. Aunidimensional solution refers to a more holistic conceptlike “mental distress” or “minor psychiatric morbidity”,possibly measured by the GHQ-12. A reliable statementabout what the GHQ-12 assesses would be very usefulto guide the application of the instrument in researchand practice.

Addressing the controversial debate about the dimen-sionality, reliability and construct validity of the GHQ-12,we [1] test and compare different dimensional modelsdiscussed in the previous literature, [2] quantifying psycho-metric properties, and [3] studying associations of the GHQ-12 with depression, anxiety, social phobia, and self-perceived health status to determine what the GHQ-12assesses in a large scale German representative populationsample. The primary aim of our study is to test psychometricproperties of the GHQ-12 in the general population and to

support or discourage the future application of the instrumentin population based studies.

2. Methods

2.1. Subjects

A representative sample of the German generalpopulation was selected with the assistance of ademographic consulting company. The area of Germanywas separated into 201 sample areas representing thedifferent regions of the country. Households of therespective area and members of this household fulfillingthe inclusion criteria (age at or above 14, able to read andunderstand the German language) were selected randomlyby Kish-selection-grid technique. The Kish-selection-grid-technique targets individuals on the doorstep amonghousehold residents. The system is devised so that allindividuals in a household have an equal chance ofselection. The sample is representative in terms of age,gender, and education. A first attempt was made for 3,194addresses, of which 3,108 were valid. If not at home, amaximum of four attempts were made to contact theselected person. 872 subjects (28.1%) refused participa-tion, 137 subjects (4.4%) were not reached after fourattempts, and 10 subjects (0.3%) refused participationbecause of severe health problems. All subjects werevisited by a study assistant, informed about the investi-gation, and self-rating questionnaires were presented. Theassistant waited until participants answered all question-naires and offered help if the meaning of questions wasnot clear. A total of 2,066 people between the ages of 14and 93 years agreed to participate, completing the self-rating questionnaires in November and December 2002(participation rate: 66.5%). 25 subjects were excludedfrom the following analyses because of incomplete data.Thus, a dataset of 2,041 people is included in this study.Table 1 gives an overview of the demographic character-istics of the sample.

2.2. Instruments

In our study the German Version of the 12-item GeneralHealth Questionnaire (GHQ-12) [24] with a four-pointLikert-Scale (0-1-2-3) was used. Thus, the total score rangesfrom 0–36, with higher scores representing higher levels ofmental distress. Supplementing the GHQ-12, several otherinstruments assessing mental distress and self-perceivedgeneral health status were used in the study.

The Beck Depression Inventory (BDI) [25,26] is one ofthe most commonly used measures to assess the severity ofdepressive symptomatology. It is well established in researchand clinical practice. Its reliability and validity were provenin numerous studies [27].

To assess anxiety and depression, the Short form of thePatient Health Questionnaire (PHQ) [28,29] was used. This

Table 1Demographic characteristics of the representative population sample.

Total (N=2,041) Male (N=959) Female (N=1,082)

Age M 48.8 47.1 50.2SD 18.1 17.5 18.5

Age groups 14–24 years 11.9% (242) 14.0% (134) 10.0% (108)25–34 years 13.2% (269) 13.4% (128) 13.0% (141)35–44 years 17.6% (360) 17.3% (166) 17.9% (194)45–54 years 16.4% (334) 16.9% (162) 15.9% (172)55–64 years 18.0% (368) 19.0% (182) 17.2% (186)65–74 years 14.8% (302) 15.8% (151) 14.0% (151)≥75 years 8.1% (166) 3.7 (36) 12.0% (130)

Urbanity Rural area 28.9% (590) 29.1% (279) 28.7% (311)Urban area 71.1% (1,451) 70.9% (680) 71.3% (771)

Education No qualifications 2.0% (41) 1.2% (11) 2.8% (30)Less than 10 years 46.3% (944) 45.0% (431) 47.4% (513)10 years of education 35.3% (718) 36.5% (350) 34.0% (718)More than 10 years 16.3% (338) 17.4% (167) 15.8% (171)

Net household income b 750 €/month 10.8% (211) 9.4% (87) 12.0% (124)750 to 1249 €/month 28.4% (554) 25.2% (232) 31.2% (322)1250 to 1999 €/month 36.2% (708) 38.6% (352) 34.2% (352)≥2000 €/month 24.6% (480) 26.8% (247) 22.6% (233)

408 M. Romppel et al. / Comprehensive Psychiatry 54 (2013) 406–413

instrument for psychiatric case definition in primary caredemonstrated good validity and reliability. Compared withstructured clinical interviews, 98% sensitivity and 80%specificity were shown [30]. Depressive symptoms werescreened with the PHQ-2 [31,32], a short form of the PHQdepression subscale. This short depression screener con-tains two items assessing anhedonia and depressed moodover the past two weeks, scoring from 0 (“not at all”) to 3(“nearly every day”). The total score of the PHQ-2 rangesfrom 0 to 6. Compared with structured clinical interviews,83% sensitivity and 93% specificity was shown [31]. Thefive items of the anxiety module of the PHQ assess panicattacks. The first question assesses whether there was apanic attack within the last four weeks. If there was at leastone attack, four further items assess whether there wereprevious attacks, if the attacks are unexpected, impairingand if there are typical physical symptoms like tachycardiaor dizziness. Response categories for these items are yesvs. no. Again, a total anxiety score ranging from 0 to 5was calculated.

The Social Phobia Inventory (SPIN) [33,34] contains 17items referring to fear, avoidance and physiological symp-toms of social phobia in the previous seven days. Responsecategories for these items range from 0 (“not at all”) to 4(“extremely”). The total score ranges from 0–68. Goodpsychometric properties have been established [33,34].

A popular measure of self-perceived general healthstatus is the Short Form 36 (SF-36) [35]. It consists of 36items which cover eight domains: Physical functioning (10items), Role Limitations – Physical (four items), BodilyPain (two items), Social Functioning (two items), GeneralHealth (five items), Role Limitations – Emotional (threeitems), Vitality (four items), and Mental Health (fiveitems). After transformation, each subscale ranges from 0 to100; higher scores indicate better health status. Addition-

ally, it contains a single question that assesses change inhealth from 1 year ago.

2.3. Statistical analyses

The models described in the introduction section abovewere tested via confirmatory factor analyses using Mplus6.1 [36]. In a confirmatory factor analysis a theoreticallyderived factor structure is defined, and the degree to whichthe covariance structure estimated by the model fits theempirically observed covariance structure is assessed [37].Model 1 represents the three-dimensional conception of theGHQ-12, with three latent variables (social dysfunction,anxiety/depression, and loss of confidence) and six, fourand two measured variables loading onto them. Since alatent variable, that is represented using only two in-dicators, is locally under-identified, an equality constrainton the two loadings associated with the latent variable canbe placed, following the recommendation of Little andcolleagues [38]. Model 2 depicts the two-factor model, withtwo latent variables (social dysfunction and anxiety/depression) and six measured variables loading onto each.Model 3 also represents a three-dimensional conception,but with “cope”, “stress”, and “depression” as latentvariables and four, three and five measured variables.Model 4 represents the one-dimensional conception of theGHQ, with all 12 items defined as indicators of a singlefactor. Finally, we tested the unidimensional modeldescribed by Hankins [22] as Model 5. As describedabove, in this model the GHQ-12 was modelled as ameasure of one construct but with correlated error terms onthe negative formulated items, modelling response bias.This model was therefore identical to model 4, but itcontains correlations between the error terms on thenegative items (items 4, 5, 6, 7, 8, 9).

409M. Romppel et al. / Comprehensive Psychiatry 54 (2013) 406–413

In our analyses the maximum likelihood (ML) method ofestimation was used; error covariances were constrained tozero in all models except model 5, in order to avoidoverfitting and capitalizing on chance associations in thedata [39]. In addition to the standardized factor loadings, theconstruct reliability (the extent to which the indicators of aconstruct share common variance) and the average varianceextracted (the extent to which the variance in the indicatorsis accounted for by the latent construct) were calculated.Values greater than .7 (reliability) and .5 (average varianceextracted) indicate good reliability and convergent validity.Discriminant validity (the extent to which a construct in themodel is distinct from another construct) was tested using

Table 2Standardized factor loadings and goodness-of-fit statistics for five alternative confi

Model 1

SocialDysfunction

Anxiety/Depression

Loss ofConfidence

SocialDysfunc

1. Able to concentrate .66 .652. Capable of makingdecisions

.58 .58

3. Face up to problems .61 .614. Lost sleep overworry

.66

5. Constantly understrain

.59

6. Could not overcomedifficulties

.75

7. Unhappy anddepressed

.80

8. Loss ofself-confidence

.88

9. Thinking of selfas worthless

.82

10. Play useful partin things

.52 .54

11. Enjoy day-to-dayactivities

.64 .63

12. Reasonably happy .75 .74

Construct reliability .79 .80 .84 .80Average varianceextracted

.40 .50 .72 .40

Fornell-Larcker-ratio 1.9 1.5 1.0 1.6

Correlations betweenfactors

.86 .65

.85Fit-statistics of the modelChi² ⁎ 1146.9 1507.7df 51 53BIC 34963.4 35308.8CFI 0.90 0.86TLI 0.87 0.83RMSEA (90% CI) 0.10

(0.10-0.11)0.12(0.11-0.

SRMR 0.06 0.06

BIC = Bayesian Information Criterion; CFI = comparative fit index; TLI = Tuckeapproximation; 90% CI = limits of the 90% confidence interval for RMSEA; SRM

a Error terms are allowed to covary.⁎ All p's b .001.

the Fornell-Larcker-ratio [40]. A Fornell-Larcker-ratiosmaller than 1 indicates good discriminant validity. Theχ2 value, the Bayesian Information Criterion (BIC), thecomparative fit index (CFI), the Tucker Lewis index (TLI),the root mean square error of approximation (RMSEA), andthe standardized root mean square residual (SRMR) arereported as fit indices. When comparing models, the modelwith the lowest BIC value is usually the one to be preferred.Although, strictly speaking, the comparison of BIC valuesprovides only a ranking, a difference of 10 points or morehas been suggested as a nearly certain indicator of adifference in fit between models [41]. Values larger than0.95 for TLI and CFI, and values smaller than 0.06 for

rmatory factor analysis models of the General Health Questionnaire (GHQ).

Model 2 Model 3 Model 4 Model 5

tionAnxiety/Depression

Cope Stress Depression Global Global

.65 .70 .66

.63 .70 .58

.60 .64 .61.64 .73 .73 .60a

.56 .66 .66 .55a

.74 .74 .81 .66a

.81 .81 .85 .65a

.79 .79 .87 .56a

.75 .76 .84 .54a

.52 .57 .52

.59 .69 .63

.62 .78 .75

.86 .69 .70 .86 .89 .88

.52 .36 .44 .56 .41 .37

1.2 2.0 1.7 1.2 - -

.80 .86 .82

.83

1658.0 1942.5 752.051 54 3935474.5 35736.1 34659.80.85 0.82 0.930.81 0.78 0.89

12)0.12(0.11-0.12)

0.13(0.13-0.14)

0.10(0.09-0.10)

0.07 0.07 0.05

r Lewis index (non-normed fit index); RMSEA = root mean square error ofR = standardized root mean square residual.

410 M. Romppel et al. / Comprehensive Psychiatry 54 (2013) 406–413

RMSEA and smaller than 0.08 for SRMR are considered asindicators of a good fit [42].

To analyse the convergent and discriminant validity ofthe GHQ-12, Pearson's correlations were computed todetermine the relations between the scales of the GHQ-12according to the best model identified and the BDI, thePHQ, the SF-36, and the SPIN, measuring mental distressand self-perceived health status.

Means, standard deviations, item-total correlations, andresponse-probabilities were calculated as psychometricproperties of the items, Cronbach's α was calculated as ameasure of the internal consistency.

To predict the general GHQ-12-score, a total of 12measures (BDI, PHQ-2, PHQ-Anxiety, SPIN, subscales ofthe SF-36) were entered as predictors in a stepwise linearregression analysis.

Statistical analyses were conducted with the SPSS 15.0statistical package and Mplus 6.1 [36].

3. Results

3.1. Confirmatory Factor Analysis (CFA)

As shown in Table 2, all factor loadings are greater than.50, with indicator reliabilities (squared factor loadings)ranging between .27 and.77. In all models item 10 (“Playuseful part in things”) has the smallest factor loading. In allmodels the latent variables show good construct reliabilities(range .69 to .89). When comparing the BIC values of themodels, the best model is model 5 followed by model 1,although all models demonstrate rather unsatisfactory fitindices (Tucker-Lewis Indexb .90 and root mean square errorof approximationN .08). The unidimensional model incorpo-rating response bias (model 5) reaches the best overall fit forthe data and a significantly better fit than model 4, theunidimensional model with error covariances constrained tozero (Δχ2 =1190.5, df=15, pb .001).

Table 3Pearson correlations between the General Health Questionnaire (GHQ) andother scales.

Cronbach'salpha

Models 4 and 5

Global-scale

BDI .93 .63PHQ-Anxiety .65 .33PHQ-2 (Depression) .78 .55SF-36: Physical Functioning .93 -.26SF-36: Role Limitations (physical) .92 -.33SF-36: Bodily Pain .89 -.38SF-36: General Health. .79 -.43SF-36: Vitality .81 -.52SF-36: Social Functioning .79 -.54SF-36: Role Limitations (emotional) .92 -.43SF-36: Mental Health .82 -.62SPIN .91 .28

All p's b .001.

3.2. Convergent and discriminant validity

The average variance extracted is satisfactory only forsome of the factors. Due to the high intercorrelationsbetween the factors, all but one factor (“Loss of Confidence”in model 1) miss the criterion for a good discriminantvalidity of the constructs, namely a Fornell-Larcker-ratiosmaller than 1 (Table 2).

Pearson's correlations between the scales of the GHQ andthe scales of several other instruments measuring mentaldistress and self-perceived health status are presented inTable 3. Because of the large sample size, the coefficients areall significant at the level of pb .001.

As presented in Table 3, the highest correlations ofthe GHQ are found with the BDI and the SF-36-Subscale “Mental Health”, followed by the PHQ-2 andthe SF-36-Subscales social functioning and vitality. Theother correlations show low ranges. From the perspectiveof discriminant and convergent validity, the assumptionthat only one global factor describes the GHQ-12 seemsto be justified.

3.3. Psychometric properties of the GHQ

Item and scale characteristics of the unidimensionalGHQ-12-conception were evaluated on the basis of thetotal sample (N=2.041). As shown in Table 4, item–totalcorrelations are in the upper range, and response–probabilities are in the medium range between p=0.37and 0.52. Internal consistency as a measure of thereliability of the scale can be considered to be verygood (Cronbach's α= .89, α= .79, and α= .86 for the fullscale, the positive worded items, and the negative wordeditems, respectively).

3.4. Predicting GHQ-score

As shown in Table 5, the stepwise procedure results in aregression model with only six predictors successfullypredicting the total score of the GHQ-12 with an accounted

able 4sychometric properties of the German adaptation of the General Healthuestionnaire (GHQ).

ems M SD rit p

. Lost sleep over worry 0.95 0.50 0.44 0.49

. Constantly under strain 1.02 0.48 0.51 0.51

. Able to concentrate 0.77 0.74 0.63 0.44

. Play useful part in things 0.67 0.72 0.72 0.42

. Face up to problems 1.01 0.49 0.56 0.50

. Capable of making decisions 0.79 0.73 0.56 0.45

. Could not overcome difficulties 0.67 0.65 0.59 0.42

. Reasonably happy 0.46 0.63 0.64 0.37

. Enjoy day-to-day activities 0.55 0.64 0.67 0.390. Unhappy and depressed 0.63 0.67 0.71 0.411. Loss of self-confidence 1.05 0.53 0.64 0.512. Thinking of self as worthless 1.10 0.48 0.55 0.53

=mean; SD=standard deviation; rit =part-whole corrected item-totalorrelation (related to the total score); p=response probability.

TPQ

It

123456789111

Mc

Table 5Results of the stepwise multiple regression analysis for the prediction of theglobal score of the GHQ-12.

b β T p ΔR2

BDI 0.11 0.33 13.44 b.001 0.39SF 36: Mental Health −0.72 −0.25 −9.65 b.001 0.06SF 36: Social Functioning −0.30 −0.12 −4.92 b.001 0.01PHQ-2 0.41 0.10 4.09 b.001 b0.01SF 36: Role Limitations (physical) 0.02 0.10 4.44 b.001 b0.01SF 36: Role Limitations (emotional) −0.02 −0.07 −2.80 .005 b0.01

Multiple R=0.69, R2=0.47, corrected R2=0.47, F=306.13, df=6,pb .001.

411M. Romppel et al. / Comprehensive Psychiatry 54 (2013) 406–413

variance of 47%. However, although six variables aresignificant predictors, only the BDI and the SF-36 subscale“Mental Health” contribute a relevant amount of explainedvariance to the criterion. These two variables alone explain45% of the variance. The additional variance accounted forby the remaining predictors seems to be negligible.

4. Discussion

Although the GHQ-12 is widely used, its dimensionalityis still under debate. Thus our study addresses dimension-ality, psychometric properties and conceptual determinationof the GHQ-12 in a large-scale population based sample. In afirst step, common factorial models of the GHQ-12 weretested to determine the dimensionality of the instrument.None of the previous models considered (one, two- andthree-factor solutions) shows sufficient fit indices. Regardingthe debate about the methodical aspects of the dimensionality,we tested a unidimensional model including response bias onthe negative formulated items, which was first mentioned byHankins [22]. This model showed sufficient fit indices in thestudy ofHankins [22] and the best fit of all models examined inour study. This result underpins the unidimensionality as it wasoriginally intended in the development of the GHQ-12, yet itemphasizes the importance of the response bias. Furthermore,all factors of the suggested two- and three-factors models arehighly inter-correlated, thus the amount of differentialinformation assessed with the different subscales seems notto be of particular relevance. Finally, we took a look at thecorrelations of the total GHQ-12-score with several estab-lished scales like the BDI, the SF-36 and the PHQ. From acontent-related point of view, these findings underpin theunidimensional conceptualization of the GHQ-12. Not least,its sufficient internal consistency and its medium-rangeresponse probabilities substantiate the unidimensionality ofthe GHQ-12.

Even though our study is based on a large-scalepopulation based sample, some shortcomings and futurechallenges have to be mentioned. As it was determined in thestudy of Campbell et al. [16], factorial structure depends onthe scoring method. Consequently, our results are reliable forthe Likert-Scoring, but its validity for the alternative scoring

methods of the GHQ-12 needs further investigation,although another study by Hankins shows the stability ofthe unidimensional solution across different scoring methods[17]. Other authors have realized better fits in models thattreat the responses as ordinal [43,44]. Fortunately, the Likert-Scoring is very common, and thus our results are applicableon a large number of studies. A generalizability theoryapproach may give additional insights into multiple sourcesof variability [45]. In addition, the use of bifactor-models[46,47] may be an alternative promising approach, althoughthese models pose challenges with identification andconvergence. To our knowledge both approaches have notbeen tested so far with regard to the GHQ-12.

Moreover, Kulenovic et al. [48] mentioned that thefactorial structure of the GHQ-12 depends on the sampleunder study; our results are applicable to the generalpopulation. Further investigation is needed to clarify itsstability in other populations (e.g. clinical samples).

Beside the dimensional interpretation of the GHQ-12, acategorical interpretation using cut-off-scores is oftenapplied. In the literature different cut-off-scores are sug-gested for different populations [49,50], and in differentscreening applications [51]. To support this categoricalapproach, a reliable cut-off-score is needed. Since our studydoes not include clinical interviews, this remains acontentious issue.

Beyond psychometric issues, a conceptual localization ofthe GHQ-12 seems useful. Focusing its associations withseveral established psychometric instruments, the total scoreof the GHQ-12 is highly correlated with depressivesymptomatology, and with impaired mental health andvitality (SF-36). On the other hand, it shows only low rangecorrelations with anxiety (PHQ-anxiety module) and socialphobia (SPIN). In a stepwise linear regression, only BDI andthe mental health scale of the SF-36 contributed a substantialamount of explained variance. In summary, the GHQ-12seems to be a more or less useful screening instrument for theassessment of mental distress or a minor psychiatricmorbidity with a main focus on depressive symptomatology.With regard to the prime importance of depressivesymptomatology among the several mental disorders andas a comorbid condition, this seems worthwhile. Neverthe-less, several short screening instruments for the differentaspects of mental distress have been developed and havebeen shown to have better psychometric properties comparedto the GHQ-12. Thus we would recommend preferringpsychometrically sound instruments like the differentmodules of the Patient Health Questionnaire [28].

References

[1] Goldberg DP, Williams P. A user's guide to the General HealthQuestionnaire. Basingstoke: NFER-Nelson; 1988.

[2] Werneke U, Goldberg DP, Yalcin I, Ustun BT. The stability of thefactor structure of the General Health Questionnaire. Psychol Med2000;30(4):823-9.

412 M. Romppel et al. / Comprehensive Psychiatry 54 (2013) 406–413

[3] Vanheule S, Bogaerts S. Short communication: the factorial structureof the GHQ-12. Stress Health 2005;21(4):217-22.

[4] Toyabe SI, Shioiri T, Kobayashi K, Kuwabara H, Koizumi M, EndoT, et al. Factor structure of the General Health Questionnaire (GHQ-12) in subjects who had suffered from the 2004 Niigata-Chuetsuearthquake in Japan: a community-based study. BMC Public Health2007;7:175.

[5] Schmitz N, Kruse J, Tress W. Improving screening for mentaldisorders in the primary care setting by combining the GHQ-12 andSCL-90-R subscales. Compr Psychiatry 2001;42(2):166-73.

[6] Politi PL, Piccinelli M, Wilkinson G. Reliability, validity and factorstructure of the 12-Item General Health Questionnaire Among YoungMales in Italy. Acta Psychiatr Scand 1994;90(6):432-7.

[7] Picardi A, Abeni D, Pasquini P. Assessing psychological distressin patients with skin diseases: reliability, validity and factorstructure of the GHQ-12. J Eur Acad Dermatol Venereol 2001;15(5):410-7.

[8] Kalliath TJ, O'Driscoll MP, Brough P. A confirmatory factor analysisof the General Health Questionnaire-12. Stress Health 2004;20(1):11-20.

[9] Gureje O. Reliability and the factor structure of the Yoruba Version ofthe 12-Item General Health Questionnaire. Acta Psychiatr Scand1991;84(2):125-9.

[10] Campbell A, Walker J, Farrell G. Confirmatory factor analysis of theGHQ-12: can I see that again? Aust N Z J Psychiatry 2003;37(4):475-83.

[11] French DJ, Tait RJ. Measurement invariance in the General HealthQuestionnaire-12 in young Australian adolescents. Eur Child AdolescPsychiatry 2004;13(1):1-7.

[12] Shevlin M, Adamson G. Alternative factor models and factorialinvariance of the GHQ-12: A large sample analysis using confirmatoryfactor analysis. Psychol Assess 2005;17(2):231-6.

[13] Cheung YB. A confirmatory factor analysis of the 12-item GeneralHealth Questionnaire among older people. Int J Geriatr Psychiatry2002;17(8):739-44.

[14] Graetz B. Multidimensional properties of the General HealthQuestionnaire. Soc Psychiatry Psychiatr Epidemiol 1991;26(3):132-8.

[15] Makikangas A, Feldt T, Kinnunen U, Tolvanen A, Kinnunen ML,Pulkkinen L. The factor structure and factorial invariance of the 12-item General Health Questionnaire (GHQ-12) across time: evidencefrom two community-based samples. Psychol Assess 2006;18(4):444-51.

[16] Campbell A, Knowles S. A confirmatory factor analysis of theGHQ12 using a large Australian sample. Eur J Psychol Assess 2007;23(1):2-8.

[17] Hankins M. The reliability of the twelve-item general healthquestionnaire (GHQ-12) under realistic assumptions. BMC PublicHealth 2008;8:355.

[18] Ye SQ. Factor structure of the General Health Questionnaire (GHQ-12): The role of wording effects. Pers Individ Diff 2009;46(2):197-201.

[19] Marsh HW. Positive and negative global self-esteem: a substantivelymeaningful distinction or artifactors? J Pers Soc Psychol 1996;70(4):810-9.

[20] Mook J, Kleijn WC, Vanderploeg HM. Positively and negativelyworded items in a self-report measure of dispositional optimism.Psychol Rep 1992;71(1):275-8.

[21] RothM, Decker O, Herzberg PY, Brahler E. Dimensionality and normsof the Rosenberg Self-esteem scale in a German general populationsample. Eur J Psychol Assess 2008;24(3):190-7.

[22] Hankins M. The factor structure of the twelve item General HealthQuestionnaire (GHQ-12): results of negative phrasing? Clin PractEpidemiol Ment Health 2008;4:10.

[23] Wang L, Lin WP. Wording effects and the dimensionality of theGeneral Health Questionnaire (GHQ-12). Pers Individ Diff 2011;50(7):1056-61.

[24] Schmitz N, Kruse J, Tress W. Psychometric properties of the GeneralHealth Questionnaire (GHQ-12) in a German primary care sample.Acta Psychiatr Scand 1999;100(6):462-8.

[25] Hautzinger M, Bailer M, Worall H, Keller F. Beck-Depressions-Inventar2nd ed. . Goettingen: Hogrefe; 1995.

[26] Beck AT, Ward CH, Mendelson M, Mock J, Erbaugh J. An inventoryfor measuring depression. Arch Gen Psychiatry 1961;4:561-71.

[27] Richter P, Werner J, Bastine R. Psychometrische Eigenschaften desBeck-Depression Inventars (BDI). Eine Überblick. Zeitschrift fürKlinische Psychologie 1994;23:3-19.

[28] Spitzer RL, Kroenke K, Williams JBW. Validation and utility of a self-report version of PRIME-MD – the PHQ primary care study. JAMA1999;282(18):1737-44.

[29] Löwe B, Spitzer RL, Zipfel S, Herzog W. Gesundheitsfragebogen fürPatienten (PHQ-D). Maunal und Testunterlagen. Karlsruhe: Pfizer;2002.

[30] Lowe B, Spitzer RL, Grafe K, Kroenke K, Quenter A, Zipfel S, et al.Comparative validity of three screening questionnaires for DSM-IVdepressive disorders and physicians' diagnoses. J Affect Disord2004;78(2):131-40.

[31] Kroenke K, Spitzer RL, Williams JBW. The Patient HealthQuestionnaire-2 – Validity of a two-item depression screener. MedCare 2003;41(11):1284-92.

[32] Lowe B, Kroenke K, Grafe K. Detecting and monitoring depressionwith a two-item questionnaire (PHQ-2). J Psychosom Res 2005;58(2):163-71.

[33] Sosic Z, Gieler U, Stangier U. Screening for social phobia in medicalin- and outpatients with the German version of the Social PhobiaInventory (SPIN). J Anxiety Disord 2008;22(5):849-59.

[34] Connor KM, Davidson JRT, Churchill LE, Sherwood A, Foa E,Weisler RH. Psychometric properties of the Social PhobiaInventory (SPIN) – new self-rating scale. Br J Psychiatry 2000;176:379-86.

[35] Ware JE, Sherbourne CD. TheMos 36-Item Short-Form Health Survey(Sf-36).1. Conceptual-framework and item selection. Med Care1992;30(6):473-83.

[36] Muthén LK, Muthén BO. Mplus User's Guide. 6th ed. Los Angeles:Muthén & Muthén; 2010.

[37] Brown TE. Confirmatory factor analysis for applied research. NewYork: Guilford University Press; 2006.

[38] Little TD, Lindenberger U, Nesselroade JR. On selecting indicators formultivariate measurement and modeling with latent variables: When“good” indicators are bad and “bad” indicators are good. PsycholMethods 1999;4(2):192-211.

[39] Maccallum RC, Roznowski M, Necowitz LB. Model modifications incovariance structure-analysis – the problem of capitalization onchance. Psychol Bull 1992;111(3):490-504.

[40] Fornell C, Larcker DF. Evaluating structural equation models withunobservable variables and measurement error. J Mark Res 1981;18(1):39-50.

[41] Raftery AE. Bayesian model selection in social research. SociologicalMethodol 1995;25:111-63.

[42] Hu YJ, Bentler PM. Cutoff criteria for fit indexes in covariancestructure analysis: Conventional criteria versus new alternatives. StructEquation Model 1999;6(1):1-55.

[43] Wang WC, Cunningham EG. Comparison of alternative estimationmethods in confirmatory factor analyses of the general healthquestionnaire. Psychol Rep 2005;97(1):3-10.

[44] Aguado J, Campbell A, Ascaso C, Navarro P, Garcia-Esteve L,Luciano JV. Examining the factor structure and discriminant validity ofthe 12-item General Health Questionnaire among Spanish postpartumwomen. Assessment 2012;19(4):517-25.

[45] Llabre MM, Fitzpatrick SL. Revisiting measurement models inpsychosomatic medicine research: a latent variable approach. Psycho-som Med 2012;74(2):169-77.

[46] Chen FF, Hayes A, Carver CS, Laurenceau JP, Zhang ZG. Modelinggeneral and specific variance in multifaceted constructs: a

413M. Romppel et al. / Comprehensive Psychiatry 54 (2013) 406–413

comparison of the bifactor model to other approaches. J Pers 2012;80(1):219-51.

[47] Chen FF, West SG, Sousa KH. A comparison of bifactor and second-order models of quality of life. Multivariate Behav Res 2006;41(2):189-225.

[48] KulenovicM, Kusevic Z, Grba S. Factor analysis of the General HealthQuestionnaire (GHQ 12) on the sample of the unemployed andstudents. Coll Antropol 1995;19(2):407-11.

[49] Goldberg DP, Oldehinkel T, Ormel J. Why GHQ threshold varies fromone place to another. Psychol Med 1998;28(4):915-21.

[50] Donath S. The validity of the 12-item General Health Questionnaire inAustralia: a comparison between three scoring methods. Aust N Z JPsychiatry 2001;35(2):231-5.

[51] Clarke DM, Smith GC, Herrman HE. A comparative-study ofscreening instruments for mental-disorders in general-hospital patients.Int J Psychiatry Med 1993;23(4):323-37.