Patient-informant concordance of the Structured Clinical
Interview for DSM-IV Axis II Personality Disorders (SCID-II)
and diagnostic abilities of the Personality Diagnostic
Questionnaire-4+ (PDQ-4+) in a non-forensic population of
aggressive men
Naam: W.J. Pertijs (Tom)
ANR: 138960
Onderzoeksbegeleider: Dr. P.M.C. Mommersteeg (Paula)
Tweede beoordelaar: Dr. S.Y.M. Thong (Melissa)
Geestelijke Gezondheidszorg Westelijk Noord-Brabant (GGZ-WNB)
Begeleider: Drs. C.A. van Tilburg (Carola)
Universiteit van Tilburg
Faculteit Sociale Wetenschappen
Departement Medische en Klinische Psychologie
Juni 2013
2
Abstract
Concordance of subject’s interview information and informant’s interview information from
partners obtained by the SCID-II and agreement of self-report information obtained by the
PDQ-4+ were examined in aggressive men participating in a non-forensic ambulant group
based treatment program. It was expected that SCID-II patient-informant concordance would
be particularly low, that PDQ-4+ diagnostic agreement with SCID-II for dimensional trait
scores would be moderate and that PDQ-4+ diagnostic agreement as well as efficiency for
categorical personality disorder would be poor. Pearson correlation coefficients and Kappa
values reflecting dimensional and categorical SCID-II patient-informant concordance
respectively, were generally poor, especially for antisocial and borderline personality
disorder, which were the most common in the sample, however: informants underreported
personality disorder traits of their partners on the SCID-II interview. Also intraclass
correlation coefficients and Kappa values reflecting dimensional and categorical PDQ-4+
diagnostic agreement respectively, were generally poor, especially for antisocial and
borderline personality disorder again: PDQ-4+ yielded many false positive diagnoses
compared to SCID-II, except for antisocial personality disorder diagnoses. Altogether, it can
be concluded that one might only administer the SCID-II to the patients in the first place,
although the SCID-II itself turned out to have some notable shortcomings too.
Keywords: SCID-II, PDQ-4+, patient-informant concordance, diagnostic agreement,
diagnostic efficiency, non-forensic population
3
Introduction
Personality disorders will affect treatment outcome (Skodol et al., 2005; Tyrer & Simmons,
2003). However, a systematic review of Mulder (2002) emphasizes that the effects of
personality pathology on treatment outcome appears to depend on study design, since the rate
of personality pathology varies markedly depending on how it is measured. This underlines
the importance of reliable and valid personality disorder assessment in future treatment
outcome research.
Currently, a treatment outcome study is being conducted on an ambulant group-based
treatment program for aggressive men. Research on non-forensic populations of aggressive
men is scarce. Noting the importance of reliable and valid personality disorder assessment in
future treatment outcome research, this preliminary study will examine personality disorder
assessment in this kind of populations. Because it has been expected that antisocial
personality disorder will be one of the most frequently occurring personality disorders in
populations like ours, the truthfulness of the answers and the willingness to cooperate in this
study are questioned in particular, given that deception is one of the DSM-IV-TR criteria for
antisocial personality disorder (APA, 2000). A lack of truthfulness, resulting in distorted self-
descriptions, may be revealed by informant reports from close relatives (i.e. partners).
Unwillingness to cooperate may be solved by using a self-report instrument instead of a
clinical interview in order to reduce the amount of time and effort asked from patients.
Patient-informant concordance of SCID-II
In the first part of this study, concordance of subject’s interview information and informant’s
interview information from partners of the patients obtained by the Structured Clinical
Interview for DSM-IV Axis II Personality Disorders (SCID-II) will be examined. The
4
Structured Clinical Interview for DSM-IV Axis II Personality Disorders (SCID-II) is a widely
used semi-structured diagnostic instrument for assessing all ten DSM-IV personality
disorders, as well as two other personality disorders mentioned in Appendix B (APA, 2000;
First, Spitzer, Gibbon, Williams & Benjamin, 1997). Semi-structured interviews like SCID-II
can be particularly helpful in situations in which the credibility or validity of the assessment
might be questioned, such as in forensic populations, because they provide exhaustive
evaluations (Widiger & Boyd, 2009). For this reason, SCID-II might also be helpful in our
sample. The SCID-II is considered to be the gold standard semi-structured assessment
instrument for personality disorders. Because of its generally good psychometric properties
and because it is the most commonly used in clinical research (Dreessen & Arntz, 1998;
Lobbestael, Leurgans & Arntz, 2011; Weertman, Arntz, Dreessen, Van Velzen &
Vertommen, 2003), SCID-II was chosen for use as the gold standard in this study as well.
No studies to date on patient-informant concordance using SCID-II for DSM-IV could
be found, but some research using SCID-II for DSM-III-R or its precursor, the Structured
Interview for DSM-III Personality (SIDP-III), is available. In psychiatric patients, agreement
of subject-based and informant-based diagnoses was high for any personality disorder or
individual personality disorder clusters, as well as for individual personality disorders
(Schneider et al., 2004). However, only poor to moderate correlations between subject-based
and informant-based diagnoses could be found in other studies, although some of the
correlations were still significant (Bernstein et al., 1997; Dreessen, Hildebrand & Arntz, 1998;
Zimmerman, Pfohl, Coryell, Stangl & Corenthal, 1988). Research using the SCID-II
Questionnaire for DSM-III-R (SCID-II-P) also found little meaningful correlations between
self-ratings and informant ratings in college samples (Ouimette & Klein, 1995; McKeeman &
Erickson, 1997) and psychiatric patients (Modestin & Puhan, 2000). In sum, it can be
concluded that patient-informant concordance of SCID-II diagnoses is generally poor.
5
However, patient and informant evaluations represent two different assessment approaches
with some part of unique information, so a complete agreement is not to be expected
(Klonsky, Oltmanns & Turkheimer, 2002; Modestin & Puhan, 2000).
Diagnostic agreement of PDQ-4+
In the second part of this study, agreement of subject’s interview information obtained by the
SCID-II and self-report information obtained by the Personality Diagnostic Questionnaire-4+
(PDQ-4+) will be examined. The PDQ-4+ is a relatively brief true-false self-report inventory
for assessing all ten DSM-IV personality disorders, as well as two other personality disorders
mentioned in Appendix B (APA, 2000; Bagby & Farvolden, 2004). Self-report instruments
like PDQ-4+ can be particularly helpful in situations in which maladaptive personality
functioning might be missed due to false expectations or assumptions (Widiger & Boyd,
2009). For this reason, PDQ-4+ might also be helpful in our sample. Although psychometric
properties are generally poor (Bagby & Farvolden, 2004; Bos et al., 2005; Widiger & Boyd,
2009; Wilberg et al., 2000), PDQ-4+ was chosen for use in this study because it belongs to
one of the most commonly used self-report instruments for personality disorder assessment in
clinical research. Besides, the inclusion of the two validity scales makes PDQ-4+ pre-
eminently suitable for use in our sample.
Criterion validity research on PDQ-4+ with SCID-II showed that diagnostic agreement
was generally poor (Bos, Van Velzen & Meesters, 2005; Fossati et al., 1998; De Reus, Van
den Berg & Emmelkamp, 2011; Wilberg, Dammen & Friis, 2000). However, agreement was
light to moderate for some personality disorders while using the Clinical Significance Scale, a
mini-interview belonging to PDQ-4+ and assessing the clinical significance for any of the
personality disorders (Bouvard, Vuachet & Marchant, 2011). With regard to our sample, some
studies among prison populations with high prevalence rates of antisocial personality disorder
6
must be noted. Agreement was moderate for most personality disorder diagnoses (Abdin et
al., 2011; Davison, Leese & Taylor, 2001). According to Davison et al. (2001), antisocial
personality disorder and borderline personality disorder showed even better agreement than
the others. In another study focused exclusively on antisocial personality disorder in an
offender population, a strong dimensional association was found between the antisocial
personality disorder scale of PDQ-4+ and the antisocial personality disorder module of SCID-
II, while agreement of categorical diagnoses was limited (Guy, Poythress, Douglas, Skeem &
Edens, 2008).
Diagnostic efficiency of PDQ-4+
In the third part of this study, efficiency of categorical personality disorder diagnoses
derived from self-report information obtained by the PDQ-4+ will be examined.
Criterion validity research on PDQ-4+ with SCID-II showed high false-positive rates
and low false-negative rates (Bos et al., 2005; Fossati et al., 1998; De Reus et al., 2011;
Wilberg et al., 2000). Although PDQ-4+ was tend to overdiagnose in some studies among
prison populations with high prevalence rates of antisocial personality disorder, the absence
of any personality disorder could be predicted with moderate agreement (Abdin et al., 2011;
Davison et al., 2001). Because PDQ-4+ has poor psychometric properties and tends to
overdiagnose personality disorders, some researchers conclude that PDQ-4+ is unsuitable
even as a screening instrument (Bouvard et al., 2011; Fossati et al., 1998; De Reus et al.,
2011). However, as PDQ-4+ tends to adequately predict the absence of any personality
disorder, others conclude that PDQ-4+ can be used as a screening instrument for sure, at least
to predict the presence or absence of any personality disorder (Abdin et al., 2011; Bos et al.,
2005; Davison et al., 2001; Wilberg et al., 2000).
7
Hypotheses
The aim of this study is to investigate two personality disorder assessment instruments in a
non-forensic group of aggressive men in order to develop a mode of adequate personality
disorder assessment in such groups. First, it is expected that SCID-II concordance of subject’s
interview information and informant’s interview information from partners of the patients will
be particularly low in this sample, possibly indicating distorted self-descriptions. Second, it is
expected that agreement of subject’s SCID-II interview information and PDQ-4+ self-report
information will be moderate for dimensional trait scores, but poor for categorical diagnoses.
Third, it is expected that efficiency of categorical personality disorder diagnoses derived from
PDQ-4+ self-report information will be poor with high rates of false positives when compared
to categorical personality disorder diagnoses derived from subject’s SCID-II interview
information, indicating that PDQ-4+ may not be suitable as a SCID-II substitution for
personality disorder assessment.
8
Method
Participants and procedure
In total, 34 patients and their eventual partners were addressed for research participation and
25 patients agreed. Nineteen of these patients actually had a partner and 12 of those partners
agreed to be approached as an informant for administration of the SCID-II interview, yielding
12 patient-informant couples participating in the study. Two interviewed patients refrained to
fill out the PDQ-4+ after the interview, yielding 23 cases whose interview data and self-report
data were both available. All patients were consecutive patients referred to ‘Geestelijke
Gezondheidszorg Westelijk Noord-Brabant’ (GGZ-WNB) outpatient department
‘Klachtgerichte Behandelingen’, located at Roosendaal and Bergen op Zoom, the
Netherlands, between September 2012 and March 2013. This is a generic outpatient clinic
covering a major area of the southern part of the Netherlands. The sample was drawn from
patients participating in an ambulant group-based cognitive behavioral treatment program for
aggressive men. All patients participating in this program had good knowledge of the Dutch
language, presented no clinically important cognitive impairment and did not suffer from any
acute psychotic disorder. The same was true for all informants participating in this study.
Patients and their eventual partners were invited for an 1.5-hour appointment intended
for administering the SCID-II interview and several questionnaires. After a complete
description of the study was provided, written informed consent was obtained from both the
patient and his eventual partner. In the following 1.5 hour, SCID-II was administered to the
patient while the eventual partner was waiting outside the room. After completion, the patient
was placed in another room to fill in the PDQ-4+, AUDIT, DUDIT and some other
questionnaires for the future treatment outcome study, while SCID-II was administered to the
eventual partner. SCID-II administration was done by a trained and experienced professional
9
or a trained and supervised trainee.
Demographic variables
Age, marital status, the presence of children at home, education level and job status were
asked from all participating patients and age was asked from all participating partners (i.e.
informants). Additionally, substance abuse was assessed in 56% (n = 14) of all participating
patients. They filled out the Dutch translation of the Alcohol Use Disorders Identification Test
(AUDIT) by Schippers and Broekman (2010) and the Dutch translation of the Drug Use
Disorders Identification Test (DUDIT) by Kraanen (2008), two parallel self-report
instruments screening alcohol-related and drug-related problems, respectively. The AUDIT
contains 10 items and the DUDIT contains 11 items, which provide information on different
aspects of alcohol or drug use. Items are rated on a 3- or 5-point interval scale (Babor,
Higgins-Biddle, Saunders & Monteiro, 2001; Berman, Bergman, Palmstierna & Schlyter,
2003).
Measures
SCID-II. Both patients and their partners (i.e. informants) were administered the Dutch
translation of the Structured Clinical Interview for DSM-IV Axis II Personality Disorders
(SCID-II) by Weertman, Arntz and Kerkhofs (2000). The 134-item SCID-II consists of
twelve personality disorder modules with one or a few items for each diagnostic criterion.
Diagnostic criteria are scored absent, questionable or present by the interviewer. Using SCID-
II, diagnoses can be made either categorically or dimensionally (First et al., 1997). In this
study, SCID-II modules for assessing depressive personality disorder and passive-aggressive
personality disorder were left out. The partners were administered SCID-II in a slightly
different way, since the intention was to obtain informant reports on the patients. Therefore,
10
questions had to be reworded. It was stressed that questions were relating to what extent they
experienced the questioned personality disorder traits in their partner (i.e. the patient).
PDQ-4+. Participants filled out the Dutch translation of the Personality Diagnostic
Questionnaire-4+ (PDQ-4+) by Akkerhuis, Kupka, Van Groenestijn and Nolen (1996). The
99-item PDQ-4+ consists of twelve personality disorder scales with one item for each
diagnostic criterion. It also includes a 4-item Too Good Scale to assess underreporting and a
2-item Suspect Questionnaire Scale to identify individuals who are lying, responding
randomly or not taking the questionnaire seriously. Items are scored true or false. Using PDQ-
4+, diagnoses can be made either categorically or dimensionally (Bagby & Farvolden, 2004;
Hyler, 1994).
Preliminary explorative analysis
Assumptions for parametric tests and statistics. All dimensional variables were examined
by using histograms, stam-and-leafplots, normal and detrended normal Q-Q-plots and
boxplots to determine normality of distributions, the range of scores and outliers in order to
decide if parametric tests and statistics might be used. Linearity, homoscedasticity, direction
and strength of relationships were assessed by scatterplots to decide if correlations might be
calculated.
Reliability of PDQ-4+ and ability of PDQ-4+ validity scales. Before conducting main
analyses, reliability analysis of the PDQ-4+ was conducted by calculating mean inter-item
correlations for each scale and the ability of the PDQ-4+ validity scales was assessed by
means of linear regression analysis.
Main analysis
Patient-informant concordance SCID-II. Pearson correlation coefficients were used as a
11
measure for concordance of dimensional trait scores derived from either subject’s SCID-II
interview information or informant’s SCID-II interview information from partners of the
patients, as the experiences of the patients and their partners are not considered to be the same
and thus provide some part of unique information (Klonsky et al., 2002; Modestin & Puhan,
2000). There is no absolute standard for interpreting Pearson correlation coefficients.
According to Hinkle, Wiersema and Jurs (2003), values from .00 to .29 represent negligible
concordance, values between .30 and .49 represent low concordance, values between .50 and
.69 represent moderate concordance, values between .70 and .89 represent high concordance
and values from .90 to 1.00 represent almost perfect concordance. Although many other
proposals do exist, these benchmarks suggested by Hinkle et al. (2003) were adopted in this
study.
Kappa values (Cohen, 1960) were used as a measure for concordance of categorical
personality disorder diagnoses derived from either subject’s SCID-II interview information or
informant’s SCID-II interview information from partners of the patients, because they correct
for chance agreements on nominal categories. In small samples, when a diagnosis occurs at a
very low base rate, Kappa values do have high variability (Shrout, Spitzer & Fleiss, 1987).
Therefore, Kappa values were calculated only for diagnoses occurring in 5% or more of the
sample using SCID-II administered to the patients themselves as criterion. Additionally,
Kappa values were not calculated if one variable was a constant in the 2-ways table. As for
Pearson correlation coefficients, there is no absolute standard for interpreting Kappa values.
According to Spitzer and Fleiss (1974), values below 0.5 represent low concordance, those
between 0.5 and 0.7 represent moderate concordance and those greater than 0.7 represent high
concordance. Although many other proposals do exist, these benchmarks suggested by Spitzer
and Fleiss (1974) were adopted in this study.
Diagnostic agreement PDQ-4+. Type A intraclass correlation coefficients (ICC) using an
12
absolute agreement definition (Cronbach, Gleser, Nanda & Rajaratnam, 1971; McGraw &
Wong, 1996; Shrout & Fleiss, 1979) were used as a measure for diagnostic agreement of
dimensional trait scores derived from either subject’s SCID-II interview information or PDQ-
4+ self-report information, as systematic variability due to the measures is considered
relevant, since it is expected that measures with SCID-II and PDQ-4+ provide the same
information. These intraclass correlations differ from Pearson correlations in that mean
differences between raters are classified as error, resulting in lower correlations (Cronbach et
al., 1971). The earlier mentioned benchmarks suggested by Hinkle et al. (2003) were adopted
for interpreting intraclass correlation coefficients as well.
Kappa values (Cohen, 1960) were used as a measure for diagnostic agreement of
categorical personality disorder diagnoses derived from either subject’s SCID-II interview
information or PDQ-4+ self-report information, for the same reason as they were chosen as a
measure for concordance of categorical personality disorder diagnoses.
Diagnostic efficiency PDQ-4+. Diagnostic efficiency was defined by sensitivity, specificity,
positive predictive power (PPP) and negative predictive power (NPP). Sensitivity refers to the
proportion of positives according to the standard (SCID-II) who are correctly identified as
such by the instrument in question (PDQ-4+). It represents the probability that someone with
a particular SCID-II diagnosis will have the same PDQ-4+ diagnosis too. The specificity
refers to the proportion of negatives according to the standard (SCID-II) who are correctly
identified as such by the instrument in question (PDQ-4+). It represents the probability that
someone without a SCID-II diagnosis will not have a PDQ-4+ diagnosis either. Positive
predictive power refers to the proportion of positives according to the instrument in question
(PDQ-4+) who are correctly identified as such. It represents the probability that someone with
a particular PDQ-4+ diagnosis will have the same SCID-II diagnosis too. Negative predictive
power refers to the proportion of negatives according to the instrument in question (PDQ-4+)
13
who are correctly identified as such. It represents the probability that someone without a
PDQ-4+ diagnosis will not have a SCID-II diagnosis either. As with Kappa values, the
conditional probability values reflecting sensitivity, specificity, positive predictive power
(PPP) and negative predictive power (NPP) were not calculated for diagnoses occurring in
less than 5% of the sample using SCID-II as criterion or if one variable was a constant in the
2-ways table. In line with previous research (e.g. Bouvard et al., 2011), conditional
probability values ranging from .00 and .29 were considered to be low, values ranging from
.30 to .69 were considered moderate and values ranging from .70 to 1.00 were considered to
be high in this study.
Post-hoc analysis of mean differences between sources of information
After conducting the main analysis, paired-samples t-tests were conducted to evaluate the
statistical significance for observed mean differences between subject’s interview,
informant’s interview and self-report information. The eta squared statistic was used as a
measure of effect size. For interpreting eta squared statistics, benchmarks suggested by Cohen
(1988) were adopted. According to Cohen (1988), a value of .01 indicates a small effect, a
value of .06 indicates a moderate effect and a value of .14 indicates a large effect.
14
Results
Sample characteristics
The mean age of the included patients was 40.0 years (SD = 9.5) and 42.5 years (SD = 9.5) for
the included partners. At entry into the study, 16% (n = 4) of the participants were single,
40% (n = 10) were in a relationship without being married, 36% (n = 9) were married and 8%
(n = 2) were separated or divorced. Of all included patients, 56% (n = 14) had children living
at home at the time. Most patients had a lower (40%, n = 10) or middle education (40%, n =
10), while the remaining part (20%, n = 5) only had primary education or had no education at
all. In total, 60% (n = 15) of all included patients had a job and were currently working at the
time of the study.
Substance abuse related to either alcohol (28,6%, n = 4), other substances (42,9%, n =
6) or both alcohol and other substances (28,6%, n = 4) was present in 50% (n = 7) of all 14
cases that filled out the AUDIT and DUDIT. The mean dimensional trait scores of personality
disorders and the prevalence of categorical personality disorder diagnoses are presented in
Table 1. Regarding SCID-II diagnoses derived from subject-based information as criterion,
not only antisocial personality disorder (48%, n = 12), but also borderline personality disorder
(36%, n = 9) turned out to be common in this sample. However, schizoid, schizotypal,
histrionic and narcissistic personality disorder were not diagnosed in this sample at all, again
regarding SCID-II diagnoses derived from subject-based information as criterion. Mean total
of reported personality disorder traits on SCID-II as well as mean total of categorical
personality disorder diagnoses on SCID-II were higher for participants aged between 31 and
40 years or aged 41 years or older, as well as for participants with primary education only or
no education at all. Unemployed participants and employed participants not working at the
time of the study also reported more personality disorder traits on average and had more
15
personality disorder diagnoses on average, so did participants in whom substance abuse was
present.
Preliminary explorative analysis
Assumptions for parametric tests and statistics. Most dimensional variables were
negatively skewed. The range was too small to calculate correlations for schizoid,
schizotypal, histrionic and narcissistic personality disorder trait scores. Linearity and
homoscedasticity was sufficient to calculate correlations for the remaining trait scores.
Reliability of PDQ-4+ and ability of PDQ-4+ validity scales. Mean inter-item correlations
for each PDQ-4+ scale are presented in Table 2. Internal consistency of the Dutch translation
of PDQ-4+ was poor for most scales. In total, 7 cases (30,4%) had a positive score on either
one or both validity scales. Multiple linear regression was used to assess the ability of the
PDQ-4+ validity scales to predict the number of items endorsed. Both scales were entered, R2
= .683, F(2, 19) = 8.291, p = .003. Positive scores on the Too Good Scale were significantly
related to a lower number of personality disorder criteria endorsed, b = -19.128, t(21) = -
3.123, p = .006. However, positive scores on the Suspect Questionnaire Scale were
significantly related to a higher number of personality disorder criteria endorsed, b = 17.538,
t(21) = 2.864, p = .010. Nevertheless, no cases were excluded in further analyses, yielding
analyses of true agreement without having sorted out any particular cases.
Main analysis
Patient-informant concordance SCID-II. One couple was excluded in analyses as they were
in divorce at the time of the interviews and because their reports turned out to be strikingly
different, yielding 11 couples whose reports have been analyzed. Pearson correlation
coefficients reflecting concordance of dimensional trait scores and Kappa values reflecting
16
concordance of categorical personality disorder diagnoses were calculated for those couples
and are presented in Table 3. Pearson correlations for schizoid, schizotypal, histrionic and
narcissistic trait scores were not calculated, because the range was too small to calculate
correlations for those traits. Kappa values for those personality disorder diagnoses were not
calculated either, because the base rates were below 5%. Concordance was high for avoidant
trait scores (r = .709, p = .014) and moderate for borderline (r = .589, p = .057), obsessive-
compulsive (r = .585, p = .059) and cluster C trait scores (r = .657, p = .028), with significant
results at the 5% significance level for avoidant and cluster C trait scores only, however.
Besides, concordance was negligible for antisocial trait scores (r = .292, p = .384) and cluster
A trait scores, as well as for the total number of personality disorder traits (r = .285, p = .396).
Moreover, concordance of categorical diagnoses was low for paranoid (κ = .-222), antisocial
(κ = .353), borderline (κ = .377) and obsessive-compulsive (κ = .298) personality disorder.
The only personality disorder that showed better agreement, was avoidant personality disorder
(κ = .542), which showed moderate agreement. Concordance of the total number of
categorical personality disorder diagnoses was also low (r = .489, p = .127).
Diagnostic agreement PDQ-4+. Intraclass correlation coefficients reflecting agreement of
dimensional trait scores and Kappa values reflecting agreement of categorical personality
disorder diagnoses were calculated for those cases and are presented in Table 4. Intraclass
correlations for schizoid, schizotypal, histrionic and narcissistic trait scores were not
calculated, because the range was too small to calculate correlations for those traits. Kappa
values for those personality disorder diagnoses were not calculated either, because the base
rates were below 5%. Agreement was medium for dependent trait scores (ICC = .608, p <
.001), as well as the total number of C-criteria of antisocial personality disorder (ICC = .674,
p < .001). Both rates were significant at the 1% significance level. Agreement rates were low
or negligible for all other scales, including antisocial (ICC = .119, p = .292), which had the
17
lowest agreement, borderline (ICC = .328, p = .055) and the total number of personality
disorder traits. However, agreement rates for avoidant (ICC = .485, p = .001), cluster B (ICC
= .312, p = .032) and cluster C trait scores (ICC = .359, p = .016), as well as for the total
number of personality disorder traits (ICC = .224, p = .048) were significant at the 5%
significance level, though. For categorical diagnoses, agreement was low for all personality
disorders except for dependent personality disorder (κ = .623), which was moderate. The
lowest agreement rates were found for antisocial (κ = .094) and borderline personality
disorder (κ = .087), as well as for paranoid personality disorder (κ = -.039). Besides,
agreement of the total number of categorical personality disorder diagnoses was negligible
(ICC = .221, p = .054).
Diagnostic efficiency PDQ-4+. Diagnostic efficiency values of categorical personality
disorder diagnoses are also presented in Table 4. Values for schizoid, schizotypal, histrionic
and narcissistic personality disorder were not calculated, because the base rates for these
personality disorders were below 5%. Sensitivity and negative predictive power were high for
avoidant (sensitivity = 1.00, NPP = 1.00), dependent (sensitivity = 1.00, NPP = 1.00) and
obsessive-compulsive personality disorder (sensitivity = .833, NPP = .900). Negative
predictive power was also high for paranoid personality disorder (NPP = .750). However,
most diagnostic efficiency values, especially specificity and positive predictive power, were
only moderate in most cases. For antisocial personality disorder, in contrary, sensitivity and
negative predictive power were slightly lower, in favor of specificity and positive predictive
power, respectively.
Post-hoc analysis of mean differences between sources of information
Examining Table 1, informants seem to underreport personality disorder traits of their
partners on the SCID-II interview, compared to the patients themselves. To evaluate this
18
presumption, paired-samples t-tests were conducted on the total number of traits and the
number of categorical personality disorders derived from the subject’s interview on one hand
and the total number of traits and the number of categorical personality disorders derived
from the informant’s interview on the other hand. The results from these tests are presented in
Table 5. For dimensional trait scores, the total number derived from the informant’s interview
(M = 15.2, SD = 4.27) was significantly lower than the number derived from the subject’s
interview (M = 19.6, SD = 6.67), t (10) = 2.12, p = .030 (one-tailed). The eta squared statistic
(.313) indicated a large effect size. Also for categorical diagnoses, the total number derived
from the informant’s interview (M = 1.27, SD = .647) was significantly lower than the number
derived from the subject’s interview (M = 2.00, SD = 1.27), t (10) = 2.19, p = .027 (one-
tailed). Again, the eta squared statistic (.323) indicated a large effect size.
Further examination of Table 1 shows that patients seem to overreport personality
disorder traits on the PDQ-4+ self-report, compared to their reports on the SCID-II interview.
To evaluate this presumption, paired-samples t-tests were conducted on the total number of
traits and the number of categorical personality disorders derived from the SCID-II subject’s
interview on one hand and the total number of traits and the number of categorical personality
disorders derived from the PDQ-4+ self-report on the other hand. The results for these tests
are also presented in Table 5. For dimensional trait scores, the total number derived from the
PDQ-4+ self-report (M = 31.0, SD = 14.0) was significantly higher than the number derived
from the SCID-II subject’s interview (M = 18.6, SD = 8.30), t (22) = -4.53, p < .001 (one-
tailed). The eta squared statistic (.483) indicated a large effect size. Also for categorical
diagnoses, the total number derived from the PDQ-4+ self-report (M = 1.61, SD = 1.41) was
significantly higher than the number derived from the SCID-II subject’s interview (M = 2.00,
SD = 1.27), t (22) = -4.36, p < .001 (one-tailed). Again, the eta squared statistic (.464)
indicated a large effect size.
19
Examining Table 1 once more, antisocial personality disorder seems to be the only
exception to the above trend: although patients overreport personality disorder traits on the
PDQ-4+ self-report, they seem to underreport antisocial personality disorder traits on the
PDQ-4+, however. To evaluate this presumption, paired-samples t-tests were conducted on
the antisocial trait score and the total number of C-criteria derived from the SCID-II subject’s
interview on one hand and the antisocial trait score and the total number of C-criteria derived
from the PDQ-4+ self-report on the other hand. Such as those of the previous t-tests, the
results for the tests are presented in Table 5. For antisocial trait scores, the total number
derived from the PDQ-4+ self-report (M = 2.57, SD = 1.67) was not significantly lower than
the number derived from the SCID-II subject’s interview (M = 2.91, SD = 1.65), t (22) = .756,
p = .229 (one-tailed). Likewise, the eta squared statistic (.025) indicated a small magnitude of
mean difference. Also for C-criteria, the total number derived from the PDQ-4+ self-report
(M = 3.17, SD = 3.01) was not significantly lower than the number derived from the SCID-II
subject’s interview (M = 3.96, SD = 3.21), t (22) = 1.52, p = .071 (one-tailed). However, the
eta squared statistic (.655) indicated a moderate magnitude of mean difference.
20
Discussion
First, it was expected that SCID-II patient-informant concordance would be particularly low
in this sample, possibly indicating distorted self-descriptions. Indeed, concordance was poor
overall, especially for antisocial trait scores and diagnoses. Concordance of the total number
of traits and the total number of diagnoses was also poor. In general, the patients themselves
did report more personality disorder traits than their partners, so it may not be concluded that
the patients participating in the aggression treatment program inherently provided distorted
(i.e. dishonest) self-descriptions.
Second, it was expected that PDQ-4+ diagnostic agreement with SCID-II would be
moderate for dimensional trait scores, but poor for categorical diagnoses. Agreement was
modest for dependent trait scores and diagnoses, as well as for the total number of antisocial
C-criteria. Agreement of all other dimensional scales, including antisocial and borderline
personality disorder scales, was poor, so was agreement of the total number of personality
disorder traits. Agreement rates for all categorical diagnoses but dependent personality
disorder, especially for antisocial, borderline and paranoid personality disorder, were also
poor, so was agreement of the total number of diagnoses. In sum, PDQ-4+ agreement with
SCID-II for most scales including antisocial and borderline trait scores was poor and
agreement rates for both antisocial and borderline personality disorder diagnoses were among
the lowest of all, while these two personality disorders were the most prevalent in this sample
regarding SCID-II diagnoses derived from subject-based information as criterion.
Third, it was expected that PDQ-4+ diagnostic efficiency would be poor with high
rates of false positives, indicating that PDQ-4+ might not be suitable as a SCID-II substitution
for personality disorder assessment. Indeed, PDQ-4+ yielded a high rate of false positive
diagnoses compared to the patient’s SCID-II interview, except for antisocial personality
21
disorder. Therefore, it can be concluded that PDQ-4+ is not suited at all for a clinical
interview substitution in this sample, as it was expected. It is not suitable for the use as merely
a screening tool in this sample either, because sensitivity and negative predictive power were
only moderate for antisocial personality disorder. This is a substantial drawback for the use of
PDQ-4+ as a screening tool for ruling out any personality disorder either, since antisocial
personality disorder turned out to be the most common disorder in this sample regarding
SCID-II diagnoses derived from subject-based information as criterion.
The current findings from the preliminary explorative analysis on the reliability of
PDQ-4+ and ability of the PDQ-4+ validity scales are in line with previous research. Internal
consistency of the Dutch translation of PDQ-4+ was poor for most scales in this study, in
accordance to previous results presented by Bos et al. (2005) and Wilberg et al. (2000). The
Too Good Scale was related to underreporting and the Suspect Questionnaire Scale was
related to overreporting in this study, in accordance to previous results presented by Wilberg
et al. (2000) and De Reus et al. (2011).
In general, the current findings on SCID-II patient-informant concordance of this
study are also in line with previous research (Dreessen et al., 1998). Kappa values were
somewhat higher compared to previous findings by Zimmerman et al. (1988). However,
compared to findings by Schneider et al. (2004), the concordance rates found in this study
were very low. Compared to studies using the SCID-II Questionnaire for DSM-III-R (SCID-
II-P), all Pearson correlations except for antisocial trait scores were better (McKeeman &
Erickson, 1997), as well as Kappa values except for antisocial, borderline and obsessive-
compulsive personality disorder (Modestin & Puhan, 2000).
Some current findings on PDQ-4+ diagnostic agreement are not in line with previous
research, however. First, agreement was poor for most categorical personality disorder
diagnoses in this study, while Abdin et al. (2011) as well as Davison et al. (2001) found
22
moderate agreement for most personality disorder diagnoses even in prison populations.
Second, in contrary to findings of Davison et al. (2001), antisocial and borderline personality
disorder diagnoses showed worst agreement of all in this study, instead of better agreement.
Third, agreement of antisocial trait scores was particularly low in this study, whereas findings
by Guy et al. (2008) indicate a strong dimensional association of antisocial traits.
With regard to PDQ-4+ diagnostic efficiency, the current finding that PDQ-4+ yielded
high false-positive rates and low false-negative rates of categorical personality disorder
diagnoses compared to patient’s SCID-II interview information is in accordance to previous
findings (Bos et al., 2005; Fossati et al., 1998; De Reus et al., 2011; Wilberg et al., 2000).
However, in previous studies in which diagnostic efficiency values were calculated,
sensitivity and negative predictive power were not lower for antisocial personality disorder
(Bos et al., 2005; Fossati et al., 1998).
Although most findings are supported by previous research, two remarkable findings
must be noted. First, it is remarkable that for antisocial personality disorder and to a lesser
extent for borderline personality disorder, SCID-II patient-informant concordance seemed to
be generally worse compared to previous research. Second, as for patient-informant
concordance, it is also remarkable that PDQ-4+ agreement for antisocial personality disorder
and borderline personality disorder seemed to be generally worse compared to previous
research. These are important findings, since antisocial personality disorder and borderline
personality disorder were the most prevalent in this sample, as was mentioned earlier.
Some explanations for the first finding, the one on SCID-II patient-informant
concordance, can be given. One possible explanation lies in the way the SCID-II interviews
have been administered, since SCID-II still relies on subjective interpretation of the
interviewer for some part (First et al., 1997). A second explanation lies in the informant
reports provided by the partners. Overall, informants (i.e. the partners) reported less
23
personality disorder traits than the patients in the current study. This finding is in concordance
with previous findings of studies comparing personality disorder diagnoses derived from
patient’s and informant’s interview information (Dreessen et al., 1998; McKeeman &
Erickson, 1997). In another study, this trend was found for most but not all personality
disorders, however (Schneider et al., 2004). Besides, in some other studies, the trend was not
found at all (Bernstein et al., 1997) or even in a reversed direction (Modestin & Puhan, 2000;
Zimmerman et al., 1988).
Some explanations can be given for the current finding that informants (i.e. the
partners) reported less personality disorder traits than the patients. First of all, the willingness
to report personality disorder traits depends on the type of informant (Modestin & Puhan,
2000). Some partners in this study stood aloof, seeing their husband’s mental health problems
as something where they had nothing to do with. Second, most of the patients themselves
were highly motivated to participate in the treatment program and willing to disclose
themselves. A third explanation for the current findings is cognitive dissonance, which refers
to the unpleasant state of psychological arousal resulting from an inconsistency within one’s
important attitudes, beliefs or behaviors (Festinger, 1957; Kenrick, Neuberg & Cialdini,
2010). In the resulting efforts to reduce it, women cope with physical and emotional abuse by
using cognitive strategies that help them perceive their partner in a more positive way while
staying in the abusive relationship (Herbert, Silver & Ellard, 1981). A fourth explanation to
close with is traumatic bonding, which relates to strong emotional ties that develop between
two persons where one person intermittently harasses, beats, threatens, abuses, or intimidates
the other (Dutton & Painter, 1981). The last two explanations are only partial however, since
not all relationships involved in this study had abusive characteristics.
Also for the second finding, the one on PDQ-4+ agreement, some explanations can be
given. It is difficult to explain these results, especially because better agreement rates were
24
found even in forensic populations (Abdin et al., 2011; Davison et al., 2001). One explanation
lies in the way the PDQ-4+ is approached by the patients, referring to the substantial number
of patients scored positive on one or both PDQ-4+ validity scales. The Too Good Scale was
related to underreporting and the Suspect Questionnaire Scale was related to overreporting, in
accordance to previous results presented by Wilberg et al. (2000) and De Reus et al. (2011). A
second explanation lies in the self-reports on the PDQ-4+. In line with previous research by
Bos et al. (2005), Fossati et al. (1998), De Reus et al. (2011) and Wilberg et al. (2000),
patients overreported personality disorder traits. However, they underreported antisocial traits
on the PDQ-4+ compared to SCID-II, in contrast to previous results (Bos et al., 2005; Fossati
et al., 1998; De Reus et al., 2011; Wilberg et al., 2000).
Some explanations can be given for the current finding that patients reported more
personality disorder traits on the PDQ-4+, except for antisocial traits. First, self-report
instruments do not provide a valid measure of personality disorder severity because they do
not establish the maladaptivity, distress or pervasiveness of each symptom, often resulting in
overdiagnosis of personality disorders (Widiger & Boyd, 2009). Second, the PDQ-4+ has a
dichotomous scale, while the SCID-II has a 3-point Likert scale. Thus, uncertainty about an
item might have resulted in a ‘questionable’ score on the SCID-II, rescored as ‘absent’ later
on, while it might have resulted in a ‘true’ score on the PDQ-4+, rescored as ‘present’ later
on. This difference in scoring procedures also could have resulted in overdiagnosis of
personality disorders. The finding that patients reported less antisocial traits on the PDQ-4+
might be due to one of the characteristics of the antisocial personality disorder itself which is
mentioned in the DSM-IV-TR (APA, 2000) and which was mentioned in this study earlier:
deception. Thus, underreporting of antisocial traits could be a manifestation of the antisocial
traits itself (APA, 2000).
Two important comments, the first one about the current results on SCID-II patient-
25
informant concordance and the second one about the current results on PDQ-4+ diagnostic
agreement, must be made. First, the subjective view of the patient and the ‘pseudo-objective’
view of the informant (i.e. the partner in this study) reflect two different assessment
approaches to the personality: the more experiential on the one hand, the more observational
on the other hand. Therefore, absolute SCID-II patient-informant concordance is not to be
expected. Second, as it was mentioned before, the PDQ-4+ does not provide a valid measure,
has a different scoring procedure and is susceptible to distortion, especially for antisocial
traits. Therefore, absolute PDQ-4+ diagnostic agreement with SCID-II is not to be expected
either.
Previous findings in various clinical and non-clinical populations indicate that both
patient and informant reports can make unique contributions to the assessment of personality
disorders and provide strong support for using both patient and informant reports in the
assessment of personality disorders (Klonsky et al., 2002; Zimmerman, 1994). In addition,
self-report instruments are currently receiving significant research attention, because they can
also make valuable contributions to the assessment of personality disorders (Widiger & Boyd,
2009). Although neither absolute SCID-II patient-informant concordance nor absolute PDQ-
4+ diagnostic agreement is to be expected, SCID-II patient-informant concordance as well as
PDQ-4+ diagnostic agreement turned out to be exceptionally low in this non-forensic
population of aggressive men for several reasons discussed earlier. Therefore, it seems that
informant reports and self-report instruments do not contribute to a more reliable and valid
personality disorder assessment in this non-forensic population of aggressive men.
Apart from the low contribution of informant reports and self-report instruments, it
may be questioned if even the SCID-II interview itself is suitable for personality disorder
assessment in this non-forensic population of aggressive men. It was expected that the
prevalence of only antisocial personality disorder would be high in this sample. However,
26
regarding SCID-II diagnoses derived from subject-based information as criterion, the
prevalence of most other personality disorders, especially borderline personality disorder, was
unexpectedly high, even when compared to forensic populations (Abdin et al., 2000; Davison,
Leese & Taylor, 2001; Ullrich & Marneros, 2004). Therefore, it is likely that personality
disorders were overdiagnosed in this study, as well as in other studies that used SCID-II in the
same way. One possible explanation for overdiagnosis is that personality disorder diagnoses
cannot be made solely on the basis of a short structured clinical interview, as the criteria for
personality disorders in particular require much more inference on the part of the observer
(APA, 2000; Zimmerman, 1994). In addition, the DSM-IV lists general diagnostic criteria for
a personality disorder, which must be met in addition to the specific criteria for a particular
named personality disorder (APA, 2000). These criteria were not explicitly examined in the
diagnostic process however, which also could have contributed to overdiagnosis of
personality disorders in this study.
This study has several notable strengths, including the concomitant use of interviews
and a self-report instrument to assess personality disorders, independent evaluations from
both patients and informants and the use of both categorical and dimensional approaches in
assessing personality disorders. However, this study also has some shortcomings which
should be noted. First, the sample size in this study was small, raising the possibility of Type
II errors. Second, a large number of analyses were conducted, increasing the risk of Type I
errors. Third, the Clinical Significance Scale, a short interview of the PDQ-4+, which assesses
if reaching the threshold of a specific personality disorder is also clinically significant, was
omitted in this study. In previous research, the use of the Clinical Significance Scale was
found to improve the diagnostic agreement and diagnostic efficiency between the interview
and the self-report, although indices are still modest to moderate using this scale (Bouvard et
al., 2011; Reus et al., 2011). However, the scale was omitted in this study because it does not
27
seem to enhance the intended time-saving effect of administering the PDQ-4+ versus
administering the SCID-II. A fourth limitation of this study to close with, is the lack of
agreement upon the gold standard assessment instrument for personality disorders
(Zimmerman, 1994). This limits the generalizability of the current findings.
In summary, the present findings indicate that SCID-II informant interviews do not
have much additional value to the SCID-II interviews with the patients themselves: the
informants underreport personality disorder traits of their partners compared to the patients
themselves and the patients themselves do not appear to inherently provide distorted (i.e.
dishonest) self-descriptions on the SCID-II interview. Therefore, the SCID-II informant
interview can be omitted in future treatment outcome research in this non-forensic population
of aggressive men. In addition, the search in order to reduce the amount of time and effort
asked from patients participating in the aggression treatment program has resulted in the
finding that the PDQ-4+ do not appear as an adequate instrument to assess personality
disorders, even for screening purposes: the PDQ-4+ has poor psychometric properties and
overdiagnoses most personality disorders. Therefore, the PDQ-4+ should be omitted as a
possible SCID-II substitution in future treatment outcome research in this non-forensic
population of aggressive men. Altogether, it can be concluded that one might only administer
the SCID-II to the patients in the first place. However, concerns about the suitability of the
SCID-II itself for personality disorder assessment in this non-forensic population of
aggressive men have been spoken out, so there is still a long way to go for reliable and valid
personality disorder assessment on one hand, but time-saving and cost-efficient on the other
hand.
28
References
Abdin, E., Koh, K. G. W. W., Subramaniam, M., Guo, M. E., Leo, T., Teo, C., Tan, E. E., & Chong,
S. A. (2011). Validity of the Personality Diagnostic Questionnaire—4 (PDQ-4+) among
Mentally Ill Prison Inmates in Singapore. Journal of Personality Disorders, 25, 834-841.
Akkerhuis, G. W., Kupka, R. W., Groenestijn, M. A. C. van, & Nolen W. A. (1996). PDQ 4+
vragenlijst voor persoonlijkheidskenmerken: experimentele versie. Lisse: Swets & Zeitlinger.
American Psychiatric Association (1994/2000). Diagnostic and Statistical Manual of Mental
Disorders (4th ed.). Washington, D.C.: American Psychiatric Association.
Babor, T., Higgins-Biddle, J. C., Saunders, J., & Monteiro, M. G. (2001). The Alcohol Use Disorders
Identification Test: Guidelines for Use in Primary Care. Second Edition. World Health
Organization.
Bagby, R. M., & Farvolden, P. (2004). The Personality Diagnostic Questionnaire-4 (PDQ-4). In M. J.
Hilsenroth, & D. L. Segal (Eds.), Comprehensive Handbook of Psychological Assessment 2.
Hoboken, NJ: John Wiley & Sons, Inc.
Bartko, J .J., & Carpenter, W. T. (1976). On the methods of reliability. Journal of Nervous and Mental
Disease, 163, 307-317.
Berman, A. H., Bergman, H., Palmstierna, T., & Schlyter, F. (2002). Evaluation of the Drug Use
Disorders Identification Test (DUDIT) in criminal justice and detoxification settings and in a
Swedish population sample. European Addict Research, 11, 22-31.
Berman, A. H., Bergman, H., Palmstierna, T., & Schlyter, F. (2003). DUDIT: The Drug Use Disorders
Identification Test. Version 1.0. Stockholm: Karolinska Institutet, Department of Clinical
Neuroscience Section for Alcohol and Drug Dependence Research.
Bernstein, D. P., Kasapis, C., Bergman, A., Weld, E., Mitropoulou, V., Horvath, T., Klar, H. M., &
Silverman, J. (1997). Assessing Axis II disorders by informant interview. Journal of
Personality Disorders, 11, 158-167.
Bos, J. H., Velzen, C. J. M. van, & Meesters, Y. (2005). The assessment of personality disorders.
29
PDQ-4+ versus SCID-II. Nederlands Tijdschrift voor de Psychologie, 60, 107-115.
Bouvard, M., Vuachet, M., & Marchant, C. (2011). Examination of the Screening Properties of the
Personality Diagnostic Questionnaire 4+(PDQ-4+) in a non-clinical sample. Clinical
Neuropsychiatry, 8, 151-158.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological
Measurement, 20, 37-46.
Cohen, J. W. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ:
Lawrence Erlbaum Associates.
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1971). The dependability of behavioral
measurements. New York: Wiley.
Davison, S., Leese, M., & Taylor, P. J. (2001). Examination of the screening properties of the
Personality Diagnostic Questionnaire–4+ (PDQ-4+) in a prison population. Journal of
Personality Disorder, 15, 180-194.
Dreessen, L., & Arntz, A. (1998). Short-interval test–retest interrater reliability of the Structured
Clinical Interview for DSM-III-R Personality Disorders (SCID-II) in outpatients. Journal of
Personality Disorders, 12, 138-148.
Dreessen, L., Hildebrand, M., & Arntz, A. (1998). Patient-informant concordance on the Structured
Clinical Interview for DSM-III-R Personality Disorders (SCID-II). Journal of Personal
Disorders, 12, 149-161.
Dutton, D. G., & Painter, S. L. (1981). Traumatic Bonding: the development of emotional attachments
in battered women and other relationships of intermittent abuse. Victimology: An International
Journal, 1, 139-155.
Festinger, L. (1957). A theory of cognitive dissonance. Evanston, IL: Row Peterson.
First, M. B., Gibbon, M., Spitzer, R. L., Williams, J. B. W., Benjamin, L. S. (1997). User’s Guide for
the Structured Clinical Interview for DSM-IV Axis II Personality Disorders (SCID II).
Washington, D.C.: American Psychiatric Press.
Fossati, A., Maffei, C., Bagnato, M., Donati, D., Donini, M., Fiorilli, M., Novella, L., & Ansoldi, M.
(1998). Brief communication: Criterion validity of the Personality Diagnostic Questionnaire-
30
4+ (PDQ-4+) in a mixed psychiatric sample. Journal of Personality Disorders, 12, 172-178.
Guy, L. S., Poythress, N. G., Douglas, K. S., Skeem, J. L., & Edens, J. F. (2008). Correspondence
Between Self-Report and Interview-Based Assessments of Antisocial Personality Disorder.
Psychological Assessment, 20, 47-54.
Herbert, T. B., Silver, R. C., & Ellard, J. H. (1991). Coping with an Abusive Relationship: I. How and
Why Do Women Stay? Journal of Marriage and Family, 53, 311-325.
Hinkle, D. E., Wiersma, W., & Jurs, S. G. (2003). Applied Statistics for the Behavioral Sciences (5th
ed.). Boston: Houghton Mifflin
Hyler, S. E. (1994). PDQ-4+ Personality Diagnostic Questionnaire. New York: New York State
Psychiatric Institute.
Klonsky, E. D., Oltmanns, T. F., & Turkheimer, E. (2002). Informant-Reports of Personality Disorder:
Relation to Self-Reports and Future Research Directions. Clinical Psychology: Science and
Practice, 9, 300-311.
Kraanen, F. L. (2008). Drug Use Disorders Identification Test Authorized Dutch Translation.
Amsterdam: University of Amsterdam, Department of Clinical Psychology.
Lobbestael, J., Leurgans, M., & Arntz, A. (2011). Inter-Rater Reliability of the Structured Clinical
Interview for DSM-IV Axis I Disorders (SCID I) and Axis II Disorders (SCID II). Clinical
Psychology & Psychotherapy, 18, 75-79.
McKeeman, J. L., & Erickson, M. T. (1997). Self and informant ratings of SCID-II personality
disorder items for nonreferred college women: effects of item and participant characteristics.
Journal of Clinical Psychology, 53, 523-533.
McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation
coefficients. Psychological Methods, 4, 30-46.
Modestin, J., & Puhan, A. (2000). Comparison of assessment of personality disorder by patients and
informants. Psychopathology, 33, 265-270.
Mulder, R. T. (2002). Personality Pathology and Treatment Outcome in Major Depression: A Review.
American Journal of Psychiatry, 159, 359-371.
Ouimette, P. C., & Klein, D. N. (1995). Test–retest stability, moodstate dependence, and informant-
31
subject concordance of the SCID-Axis II questionnaire in a non-clinical sample. Journal of
Personality Disorders, 9, 105-111.
Reus, R. J. M. de, Berg, J. F. van den, Emmelkamp, P. M. G. (2011). Personality Diagnostic
Questionnaire 4+ is not Useful as a Screener in Clinical Practice. DOI: 10.1002/cpp.766.
Schippers, G. M., & Broekman, T. G. (2010). De AUDIT. Nederlandse vertaling van de Alcohol Use
Disorders Identification Test. Available from: http://www.mateinfo.nl/audit/audit-nl.pdf.
Schneider, B., Maurer, K., Sargk, D., Heiskel, H., Weber, B., Frölich, L., Georgi, K., Fritze, J., &
Seidler, A. (2004). Concordance of DSM-IV Axis I and II diagnoses by personal and
informant’s interview. Psychiatry Research, 127, 121-136.
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlation: uses in assessing rater reliability.
Psychological Bulletin, 86, 420-428.
Shrout, P. E., Spitzer, R. L., & Fleiss, J. L. (1987). Quantification in psychiatric diagnosis revisited.
Archives of General Psychiatry, 44, 172-177.
Skodol, A. E., Pagano, M. E., Bender, D. S., Shea, M. T., Gunderson, J. G., Yen, S., et al. (2005).
Stability of functional impairment in patients with schizotypal, borderline, avoidant, or
obsessive-compulsive personality disorder over two years. Psychological Medicine, 35, 443-
451.
Spitzer, R. L., & Fleiss, J. L. (1974). A re-analysis of the reliability of psychiatric diagnosis. British
Journal of Psychiatry, 125, 341-347.
Tyrer, P., & Simmons, S. (2003). Treatment models for those with severe mental illness and comorbid
personality disorder. British Journal of Psychiatry, 182, 15-18.
Ullrich, S., & Marneros, A. (2004). Dimensions of personality disorders in offenders. Criminal
Behavioral Mental Health, 14, 202-213.
Weertman, A., Arntz, A., Dreessen, L., Velzen, C. van, & Vertommen, S. (2003). Short-interval test-
retest interrater reliability of the Dutch version of the Structured Clinical Interview for DSM-
IV Personality Disorders (SCID-II). Journal of Personality Disorders, 17, 562-567.
Weertman, A., Arntz, A., & Kerkhofs, L. M. (2000). Gestructureerd Klinisch Interview voor DSM-IV
As-II Persoonlijkheidsstoornissen. Amsterdam: Harcourt Assessment B.V.
32
Widiger, T. A., & Boyd, S. E. (2009). Personality disorders assessment instruments. In J. N. Butcher
(Ed.), Oxford Handbook of Personality Assessment. New York: Oxford University Press, Inc.
Wilberg, T., Dammen, T., & Friis, S. (2000). Comparing Personality Diagnostic Questionnaire-4+
with Longitudinal, Expert, All Data (LEAD) standard diagnoses in a sample with a high
prevalence of axis I and axis II disorders. Comprehensive Psychiatry, 41, 295-302.
Zimmerman, M. (1994). Diagnosing personality disorders: A review of issues and research methods.
Archives of General Psychiatry, 51, 225-245.
Zimmerman, M., Pfohl, B., Coryell, W., Stangl, D., & Corenthal, C. (1988). Diagnosing personality
disorder in depressed patients: A comparison of patient and informant interviews. Archives of
General Psychiatry, 45, 733-737.
Table 1. Sample characteristics - mean dimensional trait scores of personality disorders and the prevalence of categorical personality disorder
diagnoses
Personality disorder/scale SCID-II patient
M (SD)
n = 25
Prevalence (%)
SCID-II partner
M (SD)
n = 12
Prevalence (%)
PDQ-4+
M (SD)
n = 23
Prevalence (%)
Paranoid 2.44 (1.26) 20.0 2.50 (1.38) 25.0 4.48 (2.15) 47.8 Schizoid .960 (.978) .000 .830 (.937) .000 2.43 (1.97) 34.8 Schizotypal 1.20 (1.19) .000 .670 (.651) .000 3.87 (2.42) 39.1 Antisocial A-criteria 2.92 (1.58) 1.83 (1.40) 2.57 (1.67) Antisocial C-criteria 3.96 (3.09) 3.17 (3.01) Antisocial 48.0 41.7 34.8 Borderline 4.12 (2.59) 36.0 3.50 (2.32) 33.3 5.04 (2.12) 60.9
Histrionic .600 (1.00) .000 .750 (1.29) .000 1.65 (1.47) 4.30 Narcissistic .520 (.918) .000 1.58 (2.31) .000 2.35 (1.30) 8.70
Avoidant 1.84 (1.72) 20.0 1.50 (1.83) 25.0 3.35 (2.25) 47.8 Dependent 1.36 (1.63) 8.00 1.67 (1.72) 0.00 2.00 (1.95) 17.4
Obsessive-compulsive 2.64 (1.75) 28.0 2.25 (1.29) 8.30 3.52 (1.15) 56.5 Cluster A 4.60 (12.8) 4.00 (1.54) 10.5 (5.62)
Cluster B 8.16 (4.66) 7.67 (5.80) 11.6 (4.95) Cluster C 5.84 (3.52) 5.42 (3.03) 8.87 (5.13)
Total number of traits 18.6 (7.95) 17.1 (7.48) 31.0 (14.0) Total number of diagnoses 1.60 (1.53) 1.50 (1.00) 3.70 (2.44)
Cluster A (odd, eccentric) includes paranoid, schizoid and schizotypal; Cluster B (dramatic, emotional, erratic) includes antisocial,
borderline, histrionic and narcissistic; Cluster C (anxious) includes avoidant, dependent and obsessive–compulsive; Total number of traits
includes any personality disorder. M: Mean, SD: standard deviation.
Table 2. Preliminary explorative analysis - mean inter-item correlations of PDQ-4+ scales (n
= 23)
Scale (N of items) Mean Variance Minimum Maximum Range
Paranoid (7) .347 .055 -.146 .742 .889
Schizoid (7) .254 .051 -.066 .677 .743
Schizotypal (9) .223 .052 -.178 .649 .826
Antisocial A-criteria (7) .197 .024 -.182 .467 .649
Antisocial C-criteria (13) .215 .052 -.204 .697 .901
Borderline (9) .138 .072 -.420 .636 1.06
Histrionic (8) .064 .048 -.331 .528 .859
Narcissistic (9) .056 .039 -.289 .611 .900
Avoidant (7) .337 .047 -.071 .763 .833
Dependent (8) .213 .060 -.163 1.00 1.16
Obsessive-compulsive (8) .240 .034 -.178 .572 .750
Too Good Scale (4) .154 .048 -.087 .549 .636
Suspect Questionnaire (1) - - - - -
Cluster A (22) .256 .052 -.311 .840 1.15
Cluster B (33) .090 .054 -.456 .792 1.25
Cluster C (23) .193 .050 -.322 1.00 1.32
Total number of traits (77) .143 .060 -.574 1.00 1.57
Cluster A (odd, eccentric) includes paranoid, schizoid and schizotypal; Cluster B
(dramatic, emotional, erratic) includes antisocial, borderline, histrionic and narcissistic;
Cluster C (anxious) includes avoidant, dependent and obsessive–compulsive; Total number of
traits includes any personality disorder. Two of the antisocial C-criteria component variables had zero variance and were
removed from the scale; One of the Total component variables had zero variance and was removed from the scale.
-: One of the Suspect Questionnaire variables had zero variance and was removed from the scale, so too many items were deleted from the scale to perform the analysis.
35
Table 3. Patient-informant concordance SCID-II - correlations between dimensional
personality disorder trait scores by patient’s and informant’s SCID-II interview and Kappa
values of categorical personality disorder diagnoses by patient’s and informant’s SCID-II
interview (n = 11)
Personality disorder/scale
- Source of information
M (SD) Prev. (%) r κ (SE)
Paranoid
- Patient’s interview
- Informant’s interview
2.36 (1.12)
2.27 (1.19)
18.2
18.2
.368 -.222 (.108)
Schizoid
- Patient’s interview
- Informant’s interview
1.00 (.775)
.910 (.944)
0.00
0.00
- -
Schizotypal
- Patient’s interview
- Informant’s interview
.910 (.831)
.550 (.522)
0.00 0.00
- -
Antisocial A-criteria - Patient’s interview
- Informant’s interview
3.45 (1.29)
1.73 (1.42)
.292
Antisocial
- Patient’s interview
- Informant’s interview
72.7 36.4
.353 (.197)
Borderline
- Patient’s interview
- Informant’s interview
4.09 (2.47)
3.09 (1.92)
36.7
27.3
.589 .377 (.291)
Histrionic
- Patient’s interview
- Informant’s interview
.640 (.809)
.450 (.820)
0.00
0.00
- -
Narcissistic
- Patient’s interview
- Informant’s interview
.820 (1.25)
1.00 (1.18)
0.00
0.00
- -
Avoidant
- Patient’s interview
- Informant’s interview
1.91 (1.87)
1.64 (1.86)
27.3
27.3
.709* .542 (.285)
Dependent
- Patient’s interview
- Informant’s interview
1.64 (1.75)
1.45 (1.64)
9.09
0.00
.379 -
Obsessive-compulsive
- Patient’s interview
- Informant’s interview
2.82 (1.72)
2.18 (1.33)
36.4
9.09
.585 .298 (.246)
Cluster A
- Patient’s interview
- Informant’s interview
4.27 (2.05)
3.73 (1.27)
-.122
Cluster B
- Patient’s interview
- Informant’s interview
9.00 (3.77)
6.27 (3.38)
.479
Cluster C
- Patient’s interview
- Informant’s interview
6.36 (3.98) 5.27 (3.13)
.657*
Total number of traits - Patient’s interview
19.6 (6.67)
.285
36
- Informant’s interview 15.3 (4.27)
Total number of diagnoses
- Patient’s interview
- Informant’s interview
2.00 (1.27)
1.27 (.647)
.489
Cluster A (odd, eccentric) includes paranoid, schizoid and schizotypal; Cluster B
(dramatic, emotional, erratic) includes antisocial, borderline, histrionic and narcissistic;
Cluster C (anxious) includes avoidant, dependent and obsessive–compulsive; Total number of
traits includes any personality disorder.
M: mean; SD: standard deviation; Prev.: prevalence; r: Pearson correlation coefficient;
κ: Cohen’s Kappa value; SE: standard error of Cohen’s Kappa value.
-: No calculation of correlation measures because the range was too small to calculate
correlations/because one variable was a constant in the 2-ways table or where 5% or less of
the sample were diagnosed of having the personality disorder in question.
*: p < 0.05; **: p < 0.01; ***: p < 0.001.
Table 4. Diagnostic agreement and diagnostic efficiency of PDQ-4+ - correlations between dimensional personality disorder trait scores by
patient’s SCID-II interview and PDQ-4+ self-report, Kappa values of categorical personality disorder diagnoses by patient’s SCID-II interview
and PDQ-4+ self-report and diagnostic efficiency values of categorical personality disorder diagnoses by PDQ-4+ self-report (n = 23)
Personality disorder/scale
- Source of information
M (SD) Prev. (%) ICC κ (SE) Sensitivity Specificity PPP NPP
Paranoid
- SCID-II interview
- PDQ-4+ self-report
2.43 (1.31)
4.48 (2.15)
21.7
65.2
.204 -.039 (.143) .600 .333 .200 .750
Schizoid
- SCID-II interview
- PDQ-4+ self-report
.960 (.976)
2.43 (1.97)
0.00
34.8
- - - - - -
Schizotypal
- SCID-II interview
- PDQ-4+ self-report
1.26 (1.21)
3.87 (2.42)
0.00
39.1
- - - - - -
Antisocial A-criteria
- SCID-II interview
- PDQ-4+ self-report
2.91 (1.65) 2.57 (1.67)
.119
Antisocial C-criteria - SCID-II interview
- PDQ-4+ self-report
3.96 (3.21)
3.17 (3.01)
.674***
Antisocial
- SCID-II interview
- PDQ-4+ self-report
43.5
34.8
.094 (.206) .400 .692 .500 .600
Borderline
- SCID-II interview
- PDQ-4+ self-report
4.30 (2.62)
5.04 (2.12)
39.1
60.9
.328 .087 (.187) .667 .429 .429 .667
Histrionic
- SCID-II interview
- PDQ-4+ self-report
.480 (.947)
1.65 (1.47)
0.00
4.30
- - - - - -
Narcissistic
- SCID-II interview
- PDQ-4+ self-report
.430 (.788)
2.35 (1.30)
0.00
8.70
- - - - - -
38
Avoidant
- SCID-II interview
- PDQ-4+ self-report
1.83 (1.78)
3.35 (2.25)
21.7
47.8
.485* .465 (.159) 1.00 .667 .455 1.00
Dependent
- SCID-II interview
- PDQ-4+ self-report
1.39 (1.67)
2.00 (1.95)
8.40
17.4
.608*** .623 (.236) 1.00 .908 .500 1.00
Obsessive-compulsive
- SCID-II interview
- PDQ-4+ self-report
2.57 (1.75)
3.52 (2.15)
26.1
56.5
.209 .263 (.160) .833 .529 .385 .900
Cluster A
- SCID-II interview
- PDQ-4+ self-report
4.65 (2.90)
10.5 (5.62)
.135
Cluster B - SCID-II interview
- PDQ-4+ self-report
8.13 (4.85)
11.6 (4.95)
.312*
Cluster C
- SCID-II interview
- PDQ-4+ self-report
5.78 (3.52) 8.87 (5.13)
.359*
Total number of traits - SCID-II interview
- PDQ-4+ self-report
18.6 (8.30)
31.0 (14.0)
.224*
Total number of diagnoses - SCID-II interview
- PDQ-4+ self-report
1.61 (1.41)
3.70 (2.44)
.221
Cluster A (odd, eccentric) includes paranoid, schizoid and schizotypal; Cluster B (dramatic, emotional, erratic) includes antisocial,
borderline, histrionic and narcissistic; Cluster C (anxious) includes avoidant, dependent and obsessive–compulsive; Total number of traits
includes any personality disorder.
M: mean; SD: standard deviation; Prev.: prevalence; ICC: Intraclass correlation coefficient; κ: Cohen’s Kappa value; SE: standard error of
Cohen’s Kappa value; PPP: positive predictive power; NPP: negative predictive power.
-: No calculation of correlation measures because the range was too small to calculate correlations/because one variable was a constant in
the 2-ways table or where 5% or less of the sample were diagnosed of having the personality disorder in question.
*: p < 0.05; **: p < 0.01; ***: p < 0.001.
39
Table 5. Post-hoc analysis - results of paired-samples t-tests for mean differences between subject’s interview, informant’s interview and self-
report information
Paired-samples t-test
- Variable 1 (source of information)
- Variable 2 (source of information)
M
M
SD
SD
Dif. SE (Dif.) t df p Eta sqrd.
Paired-samples t-test 1
- Number of traits (SCID-II subject)
- Number of traits (SCID-II informant)
19.6
15.2
6.67
4.27
4.36 2.06 2.12 10 .030 .313
Paired-samples t-test 2
- Number of diagnoses (SCID-II subject)
- Number of diagnoses (SCID-II informant)
2.00
1.27
1.27
.647
.727 .333 2.19 10 .027 .323
Paired-samples t-test 3
- Number of traits (SCID-II subject)
- Number of traits (PDQ-4+ self-report)
18.8
31.0
8.30
14.0
-12.4 2.75 -4.53 22 < .001 .483
Paired-samples t-test 4
- Number of diagnoses (SCID-II subject)
- Number of diagnoses (PDQ-4+ self-report)
1.61 3.70
1.41 2.44
-2.09 .478 -4.36 22 < .001 .464
Paired-samples t-test 5
- Antisocial trait score (SCID-II subject)
- Antisocial trait score (PDQ-4+ self-report)
2.91
2.57
1.65
1.67
.348 .460 .756 22 .229 .025
Paired-samples t-test 6
- Antisocial C-criteria (SCID-II subject)
- Antisocial C-criteria (PDQ-4+ self-report)
3.96
3.17
3.21
3.01
.783 .514 1.52 22 .071 .095
M: mean; SD: standard deviation; Dif.: mean difference; SE (Dif.): standard error of mean difference; t: test statistic; df: degrees of freedom; p: confidence level; Eta sqrd.: effect size.