validity of the hospital anxiety and depression scale and patient health questionnaire-9 to screen...

8
Validity of the Hospital Anxiety and Depression Scale and Patient Health Questionnaire-9 to screen for depression in patients with coronary artery disease Lesley Stafford, B.A. Hons, M.A. (Psych), MAPS a, , Michael Berk, M.B.B.Ch., M.Med. (Psych), S.A., Ph.D., FRANZCP b,c,d , Henry J. Jackson, B.A., M.A., M.A. (Clin Psych), Ph.D., FAPS c,e a Department of Psychology, School of Behavioural Science, University of Melbourne, Victoria 3010, Australia b Barwon Health and The Geelong Clinic, University of Melbourne, Victoria 3010, Australia c ORYGEN Youth Health, Parkville, Victoria 3052, Australia d Mental Health Research Institute, Parkville, Victoria 3052, Australia e School of Behavioural Science, University of Melbourne, Victoria 3010, Australia Received 11 April 2007; accepted 19 June 2007 Abstract Objective: Depression is common but frequently undetected in patients with coronary artery disease (CAD). Self-report screening instruments for assessing depression such as the Hospital Anxiety and Depression Scale (HADS) and the Patient Health Questionnaire-9 (PHQ-9) are available but their validity is typically determined in depressed patients without comorbid somatic illness. We investigated the validity of these instruments relative to a referent diagnostic standard in recently hospitalized patients with CAD. Method: Three months post-discharge for a cardiac admission, 193 CAD patients completed the HADS and PHQ-9. The Mini International Neuropsychiatric Interview (MINI) was the criterion standard. Scale reliability was calculated using Cronbach's α. Convergent validity was computed using Pearson's intercorrelations. Sensitivity and specificity for various cut-off scores for both measures and for the PHQ-9 categorical algorithm were calculated using receiver operating characteristics (ROC). For analyses, participants were assigned to two groups, major depressive disorderor any depressive disorder. Results: For all calculations, α was 0.05 and tests were two-tailed. Internal consistencies for the two measures were excellent. Criterion validity for the PHQ-9 and HADS was good. We found no statistical differences between the PHQ-9 and HADS for detecting either group; however, the categorical algorithm of the PHQ-9 for diagnosing major depression had a superior LR+ when compared with the summed HADS or PHQ-9. The operating characteristics of the screening instruments for any depressive disorderwere slightly lower than for major depressive disorder. Some optimum cut-off scores were lower than the generally recommended cut-off scores, particularly when screening for major depression (e.g., 5/6 vs. 10 and 8 for PHQ-9 and HADS, respectively). Lowering the cut off scores substantially improved the sensitivity of these instruments while retaining specificity, thereby improving their usefulness to screen for CAD patients with depression. Conclusions: Both instruments have acceptable properties for detecting depression in recently hospitalized cardiac patients, and neither scale is statistically superior when summed scores are used. The categorical algorithm of the PHQ-9 for diagnosing major depression has a superior LR+ compared to the summed PHQ-9 and HADS scores. Use of the generally recommended cut-off scores should be cautious. In light of the aversive outcomes associated with depression in CAD, screening for depression is a clinical priority. © 2007 Elsevier Inc. All rights reserved. Keywords: Coronary artery disease; Depression; Diagnosis; Self-report; Validity 1. Introduction Depression is disproportionately common in patients with coronary artery disease (CAD): 17% to 27% evidence major General Hospital Psychiatry 29 (2007) 417 424 Corresponding author. E-mail address: [email protected] (L. Stafford). 0163-8343/$ see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.genhosppsych.2007.06.005

Upload: lesley-stafford

Post on 02-Sep-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

y 29 (2007) 417–424

General Hospital Psychiatr

Validity of the Hospital Anxiety and Depression Scale andPatient Health Questionnaire-9 to screen for depression in patients with

coronary artery diseaseLesley Stafford, B.A. Hons, M.A. (Psych), MAPSa,⁎,

Michael Berk, M.B.B.Ch., M.Med. (Psych), S.A., Ph.D., FRANZCPb,c,d,Henry J. Jackson, B.A., M.A., M.A. (Clin Psych), Ph.D., FAPSc,e

aDepartment of Psychology, School of Behavioural Science, University of Melbourne, Victoria 3010, AustraliabBarwon Health and The Geelong Clinic, University of Melbourne, Victoria 3010, Australia

cORYGEN Youth Health, Parkville, Victoria 3052, AustraliadMental Health Research Institute, Parkville, Victoria 3052, Australia

eSchool of Behavioural Science, University of Melbourne, Victoria 3010, Australia

Received 11 April 2007; accepted 19 June 2007

Abstract

Objective: Depression is common but frequently undetected in patients with coronary artery disease (CAD). Self-report screeninginstruments for assessing depression such as the Hospital Anxiety and Depression Scale (HADS) and the Patient Health Questionnaire-9(PHQ-9) are available but their validity is typically determined in depressed patients without comorbid somatic illness. We investigated thevalidity of these instruments relative to a referent diagnostic standard in recently hospitalized patients with CAD.Method: Three months post-discharge for a cardiac admission, 193 CAD patients completed the HADS and PHQ-9. The Mini InternationalNeuropsychiatric Interview (MINI) was the criterion standard. Scale reliability was calculated using Cronbach's α. Convergent validity wascomputed using Pearson's intercorrelations. Sensitivity and specificity for various cut-off scores for both measures and for the PHQ-9categorical algorithm were calculated using receiver operating characteristics (ROC). For analyses, participants were assigned to two groups,‘major depressive disorder’ or ‘any depressive disorder’.Results: For all calculations, α was 0.05 and tests were two-tailed. Internal consistencies for the two measures were excellent. Criterionvalidity for the PHQ-9 and HADS was good. We found no statistical differences between the PHQ-9 and HADS for detecting either group;however, the categorical algorithm of the PHQ-9 for diagnosing major depression had a superior LR+ when compared with the summedHADS or PHQ-9. The operating characteristics of the screening instruments for ‘any depressive disorder’ were slightly lower than for ‘majordepressive disorder’. Some optimum cut-off scores were lower than the generally recommended cut-off scores, particularly when screeningfor major depression (e.g., ≥5/6 vs. ≥10 and ≥8 for PHQ-9 and HADS, respectively). Lowering the cut off scores substantially improved thesensitivity of these instruments while retaining specificity, thereby improving their usefulness to screen for CAD patients with depression.Conclusions: Both instruments have acceptable properties for detecting depression in recently hospitalized cardiac patients, and neither scaleis statistically superior when summed scores are used. The categorical algorithm of the PHQ-9 for diagnosing major depression has a superiorLR+ compared to the summed PHQ-9 and HADS scores. Use of the generally recommended cut-off scores should be cautious. In light of theaversive outcomes associated with depression in CAD, screening for depression is a clinical priority.© 2007 Elsevier Inc. All rights reserved.

Keywords: Coronary artery disease; Depression; Diagnosis; Self-report; Validity

⁎ Corresponding author.E-mail address: [email protected] (L. Stafford).

0163-8343/$ – see front matter © 2007 Elsevier Inc. All rights reserved.doi:10.1016/j.genhosppsych.2007.06.005

1. Introduction

Depression is disproportionately common in patients withcoronary artery disease (CAD): 17% to 27% evidence major

418 L. Stafford et al. / General Hospital Psychiatry 29 (2007) 417–424

depression [1–10] and 20% to 45% report subthresholddepressive symptoms [4,9–14]. Depression in patients withCAD is associated with poor health-related quality of life[15–24] and elevated risk of morbidity and mortality[2–4,9,11,25,26]. Both major and minor depressive dis-orders respond well to treatment with anti-depressants and/orpsychotherapy [27], emphasising the imperative to diagnoseand treat, yet depression is frequently undetected anduntreated in clinical practice [28]. Several well-establishedself-report instruments for screening depression are avail-able; however, the validity and reliability of these instru-ments are typically determined in depressed patients withouta comorbid somatic illness such as CAD. Symptoms such asinsomnia and loss of energy may be a result of a recentcardiac event rather than a consequence of depression.

Previous research has indicated that self-report ques-tionnaires such as the Hospital Anxiety and Depression Scale(HADS) [29] and Patient Health Questionnaire (PHQ-9) [30]can identify cardiac patients with depression [31–33].McManus et al. [32] used the Diagnostic Interview Schedule[34] as a criterion standard and found that for detecting majordepression in CAD outpatients, there were no statisticaldifferences between the operating characteristics of theCentre for Epidemiological Studies Depression Scale [35],PHQ-9, a two-item version of the PHQ [36], and a simpletwo-item depression instrument. Strik et al. [31] reported thatrelative to the Structured Clinical Interview for DSM-IV(SCID) [37], the Beck Depression Inventory [38], SymptomChecklist-90 [39], HADS and Hamilton Depression RatingScale [40] had acceptable properties for detecting major andminor depression in 206 patients 1 month after acutemyocardial infarction (AMI), but these authors did notstatistically compare the operating characteristics of therespective scales. To our knowledge, no previous study hascomparatively evaluated both the PHQ-9 and the HADSagainst a diagnostic referent standard with recently hospita-lized cardiac patients.

Given that the PHQ-9 appears to be the instrument ofchoice for screening depression in North America while theHADS is more commonly used in Europe and Britain, acomparison of the relative validity of these measures isuseful. Furthermore, the PHQ-9 and HADS differ inimportant ways such as the exclusion of somatic symptomsin the latter. Some evidence suggests that self-reportmeasures that include somatic items result in a twofoldincrease in depression prevalence rates when used to assessdepression in medically ill samples [41]. Given thatdiagnostic interviews for depression include somatic items,comparison of a diagnostic referent standard with the HADSis important. To further investigate this issue, the specificityand sensitivity of the HADS and PHQ-9 in detectingdepression in patients with CAD were compared using theMini International Neuropsychiatric Interview version 5(MINI) [42,43] as the standard diagnostic tool. The MINI is avalidated tool used to diagnose minor and major depressionaccording to the Diagnostic and Statistical Manual of

Mental Disorders, Fourth Edition (DSM-IV) [44] and issimilar to the SCID [37] in operation and principle.

The aims of this study were to (1) investigate internalconsistency and intercorrelations of the HADS and PHQ-9;(2) analyse the operating characteristics of the HADS andPHQ-9 according to an independent criterion standard fordepressive disorders; (3) determine whether either screen-ing instrument is superior for detecting DSM-IV depressivedisorders; and (4) determine optimum cut-off scores fordiscriminating between patients with and without depres-sive disorders.

2. Method

2.1. Participants

Participants were recruited between May 2005 and March2006 from the Geelong Hospital, a major hospital in regionalVictoria, Australia. All English-speaking, consentingpatients who resided permanently in Australia and werehospitalized for percutaneous transluminal coronary angio-plasty (PTCA), AMI or coronary artery bypass graft surgery(CABG) during this time were eligible for participation.There were no other exclusion criteria. According todischarge diagnoses, 528 patients were treated for PTCA(n=132), AMI (n=69), CABG (n=248), AMI and CABG(n=26), PTCA and CABG (n=5) and AMI and PTCA(n=48). Participants were recruited by postal invitation andfollow-up phone-call by the first author 6 weeks post-discharge. At this time, 13 patients were deceased. Sixteenpatients were excluded because they did not speak English, 9could not consent due to cognitive deficit, 3 declined due todepression, 9 declined due to physical illness and 249refused to participate without a specific reason. Two hundredand twenty-nine patients agreed to participate in the study.The study received ethics approval from the relevantinstitutional review committees and all participants gavewritten informed consent.

2.2. Measures

The HADS consists of two 7-item self-report subscalesdesigned to assess current depressive and anxietysymptomatology, respectively, in non-psychiatric hospitalsettings. While the possibility of combining thesesubscales into a single measure has been suggested [45],this study made use only of the seven-item depressionsubscale. The HADS excludes somatic symptoms therebyavoiding potential confounding by the somatic symptomsof CAD. Possible scores on the HADS range from 0 to21. The HADS is widely used in cardiac populations[46–49] and has well-established psychometric properties[45,50]. Originally, cut-off scores of ≥11 for probable‘caseness’ of disorder and ≥8 for possible disorder weresuggested [29]. In more recent studies, an optimal balancebetween sensitivity and specificity was found when‘caseness’ was defined by a cut-off score of ≥8 [50].

419L. Stafford et al. / General Hospital Psychiatry 29 (2007) 417–424

Consequently, a score of ≥8 is considered to indicatedepression in cardiac patients [31,46,48].

The PHQ-9 is a nine-item self-report measure developedto diagnose the presence and severity of depression inprimary care. It is based directly on DSM-IV diagnosticcriteria for major depression. It has the potential of being adual-purpose instrument that, with the same nine items, canestablish depressive disorder diagnoses using a categoricalalgorithm and grade the depressive symptom severity [51].As a severity measure, the score can range from 0 to 27.Scores of 5, 10, 15 and 20 represent thresholds demarcatingthe lower limits of mild, moderate, moderately severe andsevere depression, respectively. It is recommended that if asingle screening cut-off point were to be chosen, that this bea score of ≥10 [51]. The PHQ-9 has been shown to be validand reliable [30,51,52] and has been widely used in studieswith cardiac patients [32,53,54].

The HADS and PHQ-9 were validated against DSM-IVcriteria assessed by the MINI, a short, public domaindiagnostic structured interview that is compatible with ICD-10 and DSM-IV criteria and captures important subsyndro-mal variants. The MINI modules for dysthymia and majordepressive disorder were used. The MINI has excellentpsychometric properties and has been validated against theStructured Clinical Interview for DSM-III-R Diagnoses–Patient Version [55] and the Composite InternationalDiagnostic Interview [42,43,56,57].

2.3. Procedures

The HADS and PHQ-9 measures were mailed toparticipants 3 months post-discharge as part of a larger,longitudinal study of psychosocial factors in cardiacprognosis. Participants were required to return thesequestionnaires to the researchers using a reply-paid envel-ope. All participants were interviewed telephonically by thefirst author 3 months post-discharge, using the dysthymiaand major depressive disorder modules of the MINITelephonic administration of structured clinical interviewshas been found to be valid and reliable [58,59]. The self-report measures and telephonic clinical interview werecompleted within 2 to 3 days of each other by arrangementwith the participants. The first author was blinded to theoutcome of the questionnaire results.

Depression was assessed 3 months after index hospita-lization rather than during admission to avoid potentialconfounding of the effects of acute illness and stressassociated with hospitalization with the assessment ofdepression. Assessment of depression 3 months after theindex event also meant that the nature of the depressionassessed, by DSM-IV definition, was not a form ofAdjustment Disorder.

Outcome on the depression modules of the MINIinterview was considered the ‘gold standard’ by which thevalidity of the self-report measures was judged. Majordepressive disorder was diagnosed if participants fulfilled

DSM-IV criteria of at least one core criterion (depressedmood or anhedonia) and at least four additional criteria witha 2-week duration. With the use of the same module, minordepression was diagnosed if participants fulfilled at least onecore criterion, and one to three additional criteria with a2-week duration. Dysthymia was diagnosed if participantsfulfilled the core DSM-IV criteria and at least two additionalcriteria in the past 2 years. For analyses, participants wereassigned to a diagnostic group, ‘major depressive disorder’,or to a broader group, ‘any depressive disorder’. The lattercomprised, in addition to participants with major depressivedisorder, participants with minor depression and dysthymia.

2.4. Analysis

All data were analysed using SPSS for Windows release13.0 [60]. For all calculations, α was 0.05 and tests weretwo-tailed. Reliabilities of each of the scales were measuredas internal consistencies using Cronbach's α. Pearson'sintercorrelations were computed as measures of convergentvalidity of the scales. Criterion validity was investigated bycomputing sensitivity and specificity for various cut-offscores on the PHQ-9 and HADS and for the categoricalalgorithms of the PHQ-9. MINI diagnoses of ‘majordepressive disorder’ or ‘any depressive disorder’ constitutedthe criterion standard. Analyses were performed separatelyfor these two groups. Cut-off scores for the PHQ-9 and theHADS were obtained using receiver operating characteristics(ROC) curves. ROC analyses show instrument sensitivityand specificity combined into a single measure for allpossible cut-off scores. The ROC curves were interpreted intwo ways according to the principles specified by Lowe et al.[61]. First, to obtain an optimal trade-off between sensitivityand specificity in one step, cut-off scores with a maximalYouden Index (sensitivity+specificity−1) were used. Sec-ond, following principles of a two-stage screening ofdepressive disorders, cut-off scores demonstrating maximalsensitivity and specificity of ≥75% were examined. Thetwo-stage approach is more appropriate in clinical settingswhere positive screening results are usually followed up witha lengthy diagnostic interview and treatment, if appropriate.The one-stage approach is more appropriate in researchsettings if screening results are used to estimate depressionprevalence rates and do not directly inform clinicaldecisions. In research settings where screening is used toassess eligibility for participation in a trial, a two-stageapproach is more relevant. Ultimately, the use of a one- ortwo-stage approach and the choice of cut-off scores willdepend on the user's objectives.

Area under the curve (AUC), positive predictive values(PPV) and negative predictive values (NPV) [62] weremeasured for the suggested optimal and two-stage screeningcut-off scores and also for the generally recommended cut-off scores for the HADS and PHQ-9. AUCs were comparedstatistically using the nonparametric method described byHanley and McNeil [63].

ig. 1. ‘Any depressive disorder’: ROC curves for PHQ-9 and HADSmonths after cardiac hospitalization (N=193).

420 L. Stafford et al. / General Hospital Psychiatry 29 (2007) 417–424

3. Results

3.1. Sample characteristics

One hundred and ninety-three of the recruited patients(84.3%) completed both the structured clinical interviewand the self-report questionnaires. Twenty-eight participantsdid not return their questionnaires for an unknown reason,3 withdrew due to physical illness and 4 withdrew due todepression. The sample was predominantly male (n=156;80.8%), married (n=146; 75.6%) and retired (n=117;60.6%), with a mean of 11.12 years of formal education(S.D.=2.91; range 5–22). Mean age was 64.14 years(S.D.=10.37; range 38–91).

Thirty-five participants met diagnostic criteria for majordepression (male=24; female=11), 13 for minor depression(male=10; female=3) and 6 for dysthymia (male=6;female=0), corresponding to a 3-month post-dischargedepression rate of 28%. Nine (4.7%) of the 193 participantsmet criteria for both major depressive disorder anddysthymia, so-called “double depression”.

A prior history of depression was significantly associa-ted with having a current depressive disorder [χ2 (1,N=193)=13.12, Pb.05, φ=.27]. Participants dischargedfollowing hospitalization for both AMI and CABG weresignificantly less likely to be diagnosed with a depressivedisorder than participants with other discharge diagnoses [χ2

(4, N=193)=13.88, Pb.05, φ=.27]. There was a significantassociation between the PHQ-9 categorical algorithm fordetecting any depression and the criterion measure of ‘any

Table 1Sample characteristics of patients with coronary artery disease 3 monthpost-discharge for cardiac hospitalization (N=193)

No depressivedisorder (n=139)

Any depressivedisorder (n=54)

Pvalue

Male 116 (83.5%) 40 (74.1%) .200Age, years

(mean±S.D.)65.01±9.63 61.91±11.89 .091

Married 106 (76.3%) 40 (74.1%) .896Employed 53 (38.1%) 23 (42.6%) .685Years of education

(mean±S.D.)11.13±3.03 11.10±2.62 .963

Discharge diagnosis .008AMI 8 (5.8%) 5 (9.3%)CABG 71 (51.1%) 20 (37%)PTCA 19 (13.7%) 14 (25.9%)AMI and CABG ⁎ 20 (14.4%) 1 (1.9%)AMI and PTCA 21 (15.1%) 14 (25.9%)History of

depression ⁎37 (26.6%) 30 (55.6%) b.001

PHQ categoricalalgorithm ⁎

9 (6.5%) 20 (37%) b.001

PHQ-9 score ⁎

(mean±S.D.)2.78±3.42 10.15±6.85 b.001

HADS score ⁎

(mean±S.D.)2.58±2.49 7.02±3.60 b.001

P values were calculated using χ2 statistics for categorical data andindependent sample t tests for continuous data.

⁎ Indicates statistical significance.

Table 2‘Any depressive disorder’: operating characteristics of the PHQ-9 andHADS 3 months after cardiac hospitalization (N=193)

Sensitivity Specificity PPV NPV AUC (SE)

PHQ-9 0.85 (0.03)Cut-off score ≥5a,b,c 81.5 80.6 62.0 91.8Categorical algorithm 37.0 93.5 69.0 79.3 –HADS 0.85 (0.03)Cut-off score ≥5a,b 77.8 80.6 60.9 90.3Cut-off score ≥8 c 38.9 94.2 72.4 79.9

a Optimal cut-off scores according to maximal Youden Index (sensitiv-ity+specificity−1).

b Recommended cut-off scores for a two-stage screening (maximalsensitivity and ≥75% specificity).

s

F3

depressive disorder’ [χ2 (1, N=193)=26.12, Pb.05, φ=.38].The algorithm identified 20 of the 54 participants identifiedby the criterion tool as having ‘any depressive disorder’.

Mean scores on the PHQ-9 and HADS were 4.84(S.D.=5.69, range 0–27) and 3.82 (S.D.=3.47, range0–15), respectively. Characteristics of the sample aredisplayed in Table 1.

3.2. Internal consistencies and intercorrelations

The internal consistencies for the self-report question-naires were excellent with Cronbach's α coefficients of 0.90and 0.81 for the PHQ-9 and HADS, respectively. Theintercorrelation between the HADS and PHQ-9 was 0.72.

3.3. Comparative validity for detecting ‘any depressivedisorder’ 3 months post-discharge

The operating characteristics, NPVs, PPVs and AUCs, forthe diagnosis of ‘any depressive disorder’ are shown in Fig. 1and Table 2. For the PHQ-9, the optimum cut-off scoreof ≥5 (sensitivity=81.5%, specificity=80.6%; PPV=62%,NPV=91.8%) was equal to the cut-off score suggested for atwo-stage screening of ‘any depressive disorder’. The PHQ-9categorical algorithm had a sensitivity of 37% and aspecificity of 93.5% (PPV=69%, NPV=79.3%). For theHADS, the cut-off scores for optimum and two-stagescreening were equal.

c Generally recommended cut-off score.

Fig. 2. ‘Major depressive disorder’: ROC curves for PHQ-9 and HADS 3months after cardiac hospitalization (N=193).

421L. Stafford et al. / General Hospital Psychiatry 29 (2007) 417–424

The AUC (standard error, SE) of both the PHQ-9 and theHADS was 0.85 (0.03). Predictably, statistical comparison ofthese AUCs showed that differences between the PHQ-9 andHADS-D (P=.96) were not statistically significant.

3.4. Comparative validity for detecting ‘major depressivedisorder’ 3 months post-discharge

The operating characteristics, NPVs, PPVs and AUCs, forthe diagnosis of ‘major depressive disorder’ are shown inFig. 2 and Table 3. Optimum cut-off scores for the PHQ-9and HADS were higher than for the ‘any depressive disorder’group with scores of ≥6 (sensitivity=82.9%; specifi-city=78.7%; PPV=47.5%; NPV=95.4%) and ≥6 (sensitiv-ity=80%; specificity=77.8%; PPV=49.1%; NPV=94.8%),respectively. The PHQ-9 categorical algorithm for majordepression had a sensitivity and specificity of 34.3% and96.8%, respectively (PPV=70.6%, NPV=86.9%).

The AUCs (SE) of the PHQ-9 and HADS for detectingmajor depressive disorder were 0.88 (0.03) and 0.87 (0.03),respectively. Statistical comparison of the AUCs showed thatdifferences between the PHQ-9 and the HADS (P=.88) werenot statistically significant.

able 3ajor depressive disorder’: operating characteristics of the PHQ-9 andADS 3 months after cardiac hospitalization (N=193)

Sensitivity Specificity PPV NPV AUC (SE)

HQ-9 0.88 (0.03)Cut-off score ≥6 a 82.9 78.7 47.5 95.4Cut-off score ≥5 b 91.4 75.3 45.0 97.5Cut-off score ≥10 c 54.3 91.1 57.6 90.0Categorical algorithm 34.3 96.8 70.6 86.9 –ADS 0.87 (0.03)Cut-off score ≥6 a 80.0 81.6 49.1 94.8Cut-off score ≥5 b 85.7 75.3 43.5 96.0Cut-off score ≥8 c 45.7 91.8 55.2 88.4

a Optimal cut-off scores according to maximal Youden Index (sensitiv-y+specificity−1).

b Recommended cut-off scores for a two-stage screening (maximalnsitivity and ≥75% specificity).c Generally recommended cut-off score.

4. Discussion

The main aim of this study was to determine sensitivityand specificity of two self-report depression screeninginstruments relative to a referent diagnostic standard inrecently hospitalized patients with CAD. The resultsdemonstrated excellent internal consistencies for bothinstruments. The substantial intercorrelations between thePHQ-9 and HADS showed the extent to which the scalesmeasure the same construct. Criterion validity for the PHQ-9and HADS was good, and both instruments can berecommended to identify ‘any depressive disorder’ and‘major depressive disorder’ in recently hospitalized patientswith CAD. Diagnostic superiority of the PHQ-9 over theHADS for major depressive disorder was reported in a studyof 501 outpatients [61]. This result could reasonably havebeen expected because the PHQ-9 was developed to matchDSM-IV criteria on an item-by-item basis, while the HADS

concentrates on the core criterion of anhedonia [45].However, we found no statistical differences between thesescales for detecting depression.

The operating characteristics of the screening instru-ments for ‘any depressive disorder’ were slightly lowerthan for ‘major depressive disorder’. This is perhapsbecause the diagnostic criteria for dysthymia and minordepression are more heterogeneous than those for majordepression, representing a greater diagnostic challenge inidentifying ‘any depressive disorder’. Two symptoms ofdysthymia, low self-esteem and hopelessness, are notincluded in the PHQ-9 and this may also explain why itdoes not perform as well for detecting ‘any depressivedisorder’. This shortcoming can easily be remedied withthe inclusion of additional items. A further potential reasonfor the lower operating characteristics for ‘any depressivedisorder’ relates to the differences in chronicity criteria formajor/minor depression and dysthymia.

Our analyses of the PHQ-9 showed that a cut-off score of≥5 was appropriate for a two-stage screening approach for‘major depressive disorder’ and both a one- and two-stagescreening approach for ‘any depressive disorder’, corre-sponding with the recognised threshold demarcating thelower limits of mild depression. For ‘major depressivedisorder’, both the optimum (≥6) and the two-stage (≥5)cut-off scores were lower than the generally recommendedcut-off score of ≥10 (sensitivity=54.3%, specificity=91.1%,PPV=57.6%, NPV=90.0%). Similar findings were reportedin a study of 1024 outpatients with CAD where a PHQ-9 cut-off score of ≥10 was 54% sensitive and 90% specific [32].Since a high sensitivity and NPV are more important forscreening purposes than a high specificity and PPV, the lowsensitivity of a cut-off score of ≥10 makes it inappropriatefor this objective. Similarly, analysis of the PHQ-9categorical algorithms for detecting both groups showedlow sensitivity values, rendering these algorithms less usefulthan the suggested cut-off scores for screening purposes.However, the categorical algorithm for detecting ‘major

T‘MH

P

H

it

se

422 L. Stafford et al. / General Hospital Psychiatry 29 (2007) 417–424

depressive disorder’ had a positive likelihood ratio (LR+) of10.7 such that a positive result would not require a follow-updiagnostic interview for confirmation. The superior LR+ ofthe diagnostic algorithm over the summed PHQ-9 score fordiagnosing major depression is attributable to the fact that itis possible to score 5 on this measure without having eitherof the two cardinal symptoms of depression. As such, thesummed score does not map perfectly onto the DSM-IVcriteria for major depression. A similar comment can bemade in relation to the HADS, which focuses on the coredepression symptom of anhedonia and does not match DSMcriteria on an item-by-item basis.

On the HADS, the optimum cut-off scores and the cut-offscores suggested for two-stage screening for detecting both‘major depressive disorder’ and ‘any depressive disorder’were lower than the generally recommended cut-off score(≥8). A cut-off score of ≥8 induced lower sensitivity forboth groups, resulting in higher numbers of false negatives ifused for screening. The HADS has been described asunsuitable for diagnosing major depression [45] and ouranalysis provided support for this finding when a cut-offscore ≥8 was used to indicate depression. For instance, onecan be reasonably certain that a patient who scores below thegenerally recommended cut-off score of ≥8 does not havemajor depressive illness; however, if the patient scores above≥8, then the probability of having major depressive illness isalmost equal to the chance of not having the illness.Consequently, approximately half of the patients ‘diagnosed’as having major depressive disorder would not meet criteriain a clinical interview.

Choosing lower cut-off scores can maximize thesensitivity and NPV of the PHQ-9 and the HADS, butthis would result in unacceptably low specificity values,thereby increasing the number of false positives. Forinstance, a cut-off score of ≥1 on the HADS produces asensitivity of 98.1% and a specificity of 15.8% fordetecting ‘any depressive disorder’.

A possible limitation of this study is that participants wererequired to complete two measures of depression in onequestionnaire pack. Although other measures were placedbetween these two instruments, and the structure and contentof these two instruments differ, effects of repetition or ordercannot be excluded. In terms of the generalizability of thesefindings, this study included patients recently hospitalizedfor cardiac disease. It is unknown whether the results fromthis analysis would generalize to PHQ-9 and HADS scoresamong other populations or to patients with other comorbid-ities. It must also be recognized that the cut-off pointssuggested here may vary with an alternative criteriondiagnostic tool.

5. Conclusions

Despite these limitations, our results showed that both theHADS and PHQ-9 have acceptable properties for screening

major and subthreshold depression in patients with CAD3 months following cardiac hospitalization. To our knowl-edge, this is the first study to compare both self-reportmeasures against a diagnostic referent standard with recentlyhospitalized cardiac patients. The HADS and PHQ-9 werestatistically equivalent for detecting either the comprehen-sive or diagnostic group; however, the categorical algorithmof the PHQ-9 for diagnosing major depression had a superiorLR+ when compared with the summed HADS or PHQ-9.Since some optimum cut-off scores were lower than thegenerally recommended cut-off scores, particularly whenscreening for major depression, use of these recommendedcut-off scores should be cautious. Lowering the cut-off scoresubstantially improved the sensitivity of these instrumentswhile retaining specificity, thereby improving their useful-ness to screen for CAD patients with depression. In light ofthe aversive outcomes associated with depression in CAD,detection is a clinical priority. The choice of cut-off scores,however, also depends on the user's objectives.

Acknowledgments

The authors wish to acknowledge Jeromy Anglim for hisstatistical advice as well as the University of Melbourne fortheir financial contribution towards this project.

References

[1] Forrester A, Lipsey J, Teitelbaum M, DePaulo J, Andrzejewski P.Depression following myocardial infarction. Int J Psychiatry Med1992;22:33–46.

[2] Frasure-Smith N, Lesperance F, Talajic M. Depression followingmyocardial infarction: impact on 6-month survival. JAMA1993;270:1819–25.

[3] Frasure-Smith N, Lesperance F, Talajic M. Depression and 18-month prognosis after myocardial infarction. Circulation 1995;91:999–1005.

[4] Connerney I, Shapiro P, McLaughlin J, Bagiella E, Sloan R. Relationbetween depression after coronary artery bypass surgery and 12-monthoutcome: a prospective study. Lancet 2001;358:1766–71.

[5] Lesperance F, Frasure-Smith N, Juneau M, Theroux P. Depressionand 1-year prognosis in unstable angina. Arch Int Med 2000;160:1354–60.

[6] Carney RM, Rich MW, Freedland KE, et al. Major depressive disorderpredicts cardiac events in patients with coronary artery disease.Psychosom Med 1988;50:627–33.

[7] Frasure-Smith N, Lesperance F, Juneau M. Differential long-termimpact of in-hospital symptoms of psychological stress after non-Q-wave and Q-wave acute myocardial infarction. Am J Cardiol1992;69:1128–34.

[8] Schleifer S, Macari-Hinson M, Coyle D, et al. The nature and course ofdepression following myocardial infarction. Arch Int Med1989;149:1785–9.

[9] Rudisch B, Nemeroff CB. Epidemiology of comorbid coronary arterydisease and depression. Biol Psychiatry 2003;54:227–40.

[10] Lett HS, Blumenthal JA, Babyak MA, et al. Depression as a risk factorfor coronary artery disease: evidence, mechanisms and treatment.Psychosom Med 2004;66:305–15.

[11] Frasure-Smith N, Lesperance F, Juneau M, Talajic M, Bourassa MG.Gender, depression and one-year prognosis after myocardial infarction.Psychosom Med 1999;61:26–37.

423L. Stafford et al. / General Hospital Psychiatry 29 (2007) 417–424

[12] Pirraglia PA, Peterson JC, Williams-Russo P, Gorkin L, Charlson ME.Depressive symptomatology in coronary artery bypass graft surgerypatients. Int J Geriatr Psychiatry 1999;14:668–80.

[13] Silverstone PH. Depression increases mortality and morbidity in acutelife-threatening medical illness. J Psychosom Res 1990;34:651–7.

[14] Ladwig KH, Roll G, Breithardt G, Budde T, Borggrefe M. Post-infarction depression and incomplete recovery 6 months after acutemyocardial infarction. Lancet 1994;343:20–3.

[15] Burg MM, Benedetto C, Rosenberg R, Soufer R. Presurgicaldepression predicts medical morbidity 6 months after coronary arterybypass graft surgery. Psychosom Med 2005;65:111–8.

[16] Goyal TM, Idler EL, Krause TJ, Contrada RJ. Quality of life followingcardiac surgery: impact of the severity and course of depressivesymptoms. Psychosom Med 2005;67:759–65.

[17] Jenkins CD, Stanton BA, Jono RT. Quantifying and predictingrecovery after heart surgery. Psychosom Med 1994;56:203–12.

[18] Fauerbach JA, Bush DE, Thombs BD, McCann UD, Fogel J,Ziegelstein RC. Depression following acute myocardial infarction: aprospective relationship with ongoing health and function. Psychoso-matics 2005;46:355–61.

[19] Mallik S, Krumholz HM, Lin ZQ, et al. Patients with depressivesymptoms have lower health status benefits after coronary arterybypass surgery. Circulation 2005;111:271–7.

[20] Mayou RA, Gill D, Thompson DR, et al. Depression and anxiety aspredictors of outcome after myocardial infarction. Psychosom Med2000;62:212–9.

[21] Perski A, Feleke E, Anderson G, et al. Emotional distress beforecoronary bypass grafting limits the benefits of surgery. Am Heart J1998;136:510–7.

[22] Spertus JA, McDonell M, Woodman CL, Fihn SD. Associationbetween depression and worse disease-specific functional status inoutpatients with coronary artery disease. Am Heart J 2000;140:105–10.

[23] Sullivan MD, LaCroix AZ, Russo JE, Walker EA. Depression and self-reported physical health in patients with coronary disease: mediatingand moderating factors. Psychosom Med 2001;63:248–56.

[24] Sullivan MD, LaCroix AZ, Baum C, Grothaus LC, Katon WJ.Functional status in coronary artery disease: a one-year prospectivestudy of the role of anxiety and depression. Am J Med 1997;103:348–56.

[25] Welin C, Lappas G, Wilhelmsen L. Independent importance ofpsychosocial factors for prognosis after myocardial infarction. J IntMed 2000;247:629–39.

[26] Lesperance F, Frasure-Smith N, Talajic M, Bourassa MG. Five-yearrisk of cardiac mortality in relation to initial severity and one-yearchanges in depression symptoms after myocardial infarction. Circula-tion 2002;105:1049–53.

[27] Coulehan JL, Schulberg HC, Block M, Madonia MJ, RodriguezE. Treating depressed primary care patients improves theirphysical, mental and social functioning. Arch Int Med 1997;157:1113–20.

[28] Gelenberg A. Depression is still underrecognized and undertreated.Arch Int Med 1999;159:1657–8.

[29] Zigmond AS, Snaith RP. The Hospital Anxiety and Depression Scale.Acta Psychiatr Scand 1983;67:361–70.

[30] Spitzer RL, Kroenke K, Williams JB. Validation and utility of a self-report version of PRIME-MD: The PHQ Primary Care study. JAMA1999;282:1737–44.

[31] Strik JJMH, Honig A, Lousberg R, Denollet J. Sensitivity andspecificity of observer and self-report questionnaires in major andminor depression following myocardial infarction. Psychosomatics2001;42:423–8.

[32] McManus D, Pipkin SS, Whooley MS. Screening for depression inpatients with coronary heart disease (data from the Heart and Soulstudy). Am J Cardiol 2005;96:1076–81.

[33] Bambauer KZ, Locke SE, Aupont O, Mullan MG, McLaughlin TJ.Using the Hospital Anxiety and Depression Scale to screen for

depression in cardiac patients. Gen Hosp Psychiatry 2005;27:275–84.

[34] Robins L, Helzer J, Croughan J, Ratcliff K. National Institute of MentalHealth Diagnostic Interview. Its history, characteristics, and validity.Arch Gen Psychiatry 1981;38:381–9.

[35] Andresen E, Malmgren J, Carter W, Patrick D. Screening fordepression in well older adults: evaluation of a short form of theCES-D (Centre for Epidemiological Studies Depression Scale). Am JPrev Med 1994;10:77–84.

[36] Kroenke K, Spitzer RL, Williams J. The Patient Health Questionnaire-2: validity of a two-item depression screener. Med Care 2003;41:1284–92.

[37] First MB, Spitzer RL, Gibbon M, Williams JBW. Structured ClinicalInterview for DSM-IV Axis I Disorders-Patient Edition (SCID-I/P,Version 2.0). New York: Biometrics Research Department, New YorkState Psychiatric Institute; 1995.

[38] Beck A, Ward C, Mendelson M. Beck Depression Inventory (BDI).Arch Gen Psychiatry 1961;4:561–71.

[39] Derogatis L, Lipman R, Rickels K, Uhlenhuth E, Covi L. The HopkinsSymptom Checklist (HSCL): A self-report inventory. Beh Science1974;19(1):1–15.

[40] Hamilton M. A rating scale for depression. J Neurol NeurosurgPsychiatry 1960;23:56–62.

[41] Koenig HG, George LK, Peterson B, Pieper CF. Depression inmedically ill hospitalized adults: prevalence, characteristics and courseof symptoms according to six diagnostic schemes. Am J Psychiatry1997;154:1376–83.

[42] Sheehan DV, Lecrubier Y, Sheehan KH, et al. The validity of theMini International Neuropsychiatric Interview (M.I.N.I) accordingto the SCID-P and its reliability. Eur Psychiatry 1997;12:232–41.

[43] Lecrubier Y, Sheehan DV, Weiller E, et al. The Mini InternationalNeuropsychiatric Interview (M.I.N.I.). A short diagnostic structuredinterview: reliability and validity according to the CIDI. Eur Psychiatry1997;12:224–31.

[44] American Psychiatric Association. Diagnostic and Statistical Manualof Mental Disorders. 4th ed. Washington (DC): American PsychiatricAssociation; 1994.

[45] Herrmann C. International experiences with the Hospital Anxiety andDepression Scale: a review of validation data and clinical results. JPsychosom Res 1997;42:17–41.

[46] Roberts SB, Bonnici DM, Mackinnon AJ, Worcester MC. Psycho-metric evaluation of the Hospital Anxiety and Depression Scale(HADS) among female cardiac patients. Br J Health Psychol2001;6:373–83.

[47] Parker G, Heruc G, Hilton T, et al. Explicating links between acutecoronary syndrome and depression: study design and methods. Aust NZ J Psychiatry 2006;40:245–52.

[48] Cheok F, Schrader G, Banham D, Marker J, Hordacre AL.Identification, course, and treatment of depression after admissionfor a cardiac condition: rationale and patient characteristics for theIdentifying Depression as a Comorbid Condition (IDACC) project.Am Heart J 2003;146:978–84.

[49] Asbury EA, Creed F, Collins P. Distinct psychosocial differencesbetween women with coronary heart disease and cardiac syndrome X.Eur Heart J 2004;25:1672–4.

[50] Bjelland I, Dahl AA, Haug TT, Neckelmann D. The validity ofthe Hospital Anxiety and Depression Scale: An updated literaturereview. J Psychosom Res 2002;52:69–77.

[51] Kroenke K, Spitzer RL. The PHQ-9: A new depression diagnostic andseverity measure. Psychiatr Ann 2002;32:509–15.

[52] Kroenke K, Spitzer RL, Williams JBW. The PHQ-9: Validity ofa brief depression severity measure. J Gen Int Med 2001;16:606–13.

[53] Ketterer MW, Wulsin L, Cao JJ, et al. “Major” depressive disorder,coronary heart disease, and the DSM-IV threshold problem. Psychoso-matics 2006;47:50–5.

424 L. Stafford et al. / General Hospital Psychiatry 29 (2007) 417–424

[54] Ruo B, Rumsfeld JS, Hlatky MA, Liu H, Browner WS, Whooley MA.Depressive symptoms and health-related quality of life: The Heart andSoul study. JAMA 2003;290:215–21.

[55] Spitzer RL, Williams JBW, Gibbon M, First MB. Structured ClinicalInterview for DSM-III-R. Washington (DC): American PsychiatricPress; 1990.

[56] World Health Organization. The Composite International DiagnosticInterview (CIDI) Version 1.0. Geneva:World Health Organization; 1990.

[57] Sheehan DV, Lecrubier Y, Sheehan KH, et al. The Mini-InternationalNeuropsychiatric Interview (M.I.N.I): The development and validationof a structured diagnostic psychiatric interview for DSM-IV and ICD-10. J Clin Psychiatry 1998;59:22–33.

[58] Potts MK, Daniels M, Burnam MA, Wells KB. A structured interviewversion of the Hamilton Depression Rating Scale: evidence of

reliability and versatility of administration. J Psychiatr Res 1990;24:335–50.

[59] Simon GE, Revicki D, VonKorff M. Telephone assessment ofdepression severity. J Psychiatr Res 1993;27:247–52.

[60] SPSS Inc. SPSS Base 13.0 for Windows User's Guide. Chicago: SPSSInc; 2004.

[61] Lowe B, Spitzer RL, Grafe K, et al. Comparative validity of threescreening questionnaires for DSM-IV depressive disorders andphysicians' diagnoses. J Affect Disord 2004;78:131–40.

[62] Sackett DL. Clinical epidemiology: a basic science for clinicalmedicine. Boston: Little, Brown and Co; 1991.

[63] Hanley JA, McNeil BJ. A method of comparing the areas underreceiver operating characteristic curves derived from the same cases.Radiology 1983;148:839–43.