computerized adaptive testing in clinical substance abuse practice: issues and strategies barth...
TRANSCRIPT
Computerized Adaptive Testing in Clinical Substance Abuse Practice: Issues and Strategies
Barth RileyLighthouse Institute, Chestnut Health Systems
Overview
• CAT Basics• CAT in Clinical Assessment
• Triage of individuals to support clinical decision making• Measuring Multiple Dimensions• Identifying Persons with Atypical Presentation of Symptoms
Evidence-Based Practice• Requires accurate diagnosis, treatment placement, and
outcomes monitoring• Assessment over a wide range of domains• The cost of evidence-based assessment is:
• Time• Respondent Burden• Increased staff resources (including training
Improving Efficiency• The use of screeners and short-form
instruments has significantly improved the efficiency of the assessment process• Can help determine whether a full
assessment is warranted• But not a substitute for a full assessment
• Lack of precision• Floor and ceiling effects• Limited content validity
CAT Basics
CAT Process
Decreased Difficulty
Decreased Difficulty
Typical Pattern of Responses
Increased Difficulty
Increased Difficulty
Middle Difficulty
Middle Difficulty
Score is calculated and the next best item is selected based on item difficulty+
/- 1
Std
. E
rror
Correct Incorrect
Logical Components of CAT
• Start Rule
• Item Selection
• Measure Estimation
• Stop Rule(s)
The Start Rule
• Used to select first item
• What measure is assigned to the respondent prior to selecting the first item?
• Can be an arbitrary value (0 on the logit scale) or can be based on previously gathered information.
Item Information
0
0.5
1
-4-3
.6-3
.1-2
.7-2
.3-1
.9-1
.5-1
.1-0
.7-0
.3 0.1
0.5
0.9
1.3
1.7
2.1
2.5
2.9
3.3
3.7
Trait
Pro
ba
bili
ty/In
form
ati
on
Prob. Information
Item Difficulty = 0.5
too difficult too easy
Maximum information, Trait level = 0.5
CAT in Clinical Assessment
Clinical Decision Making
• How severe are the symptoms?
• What type of treatment is most appropriate?
• Can CAT be used to answer these questions more efficiently?
Strategy
• Starting Rules•Using screener measures to set
the initial measure and select the first item
• Variable Stop Rules•Tight precision around cut points•Less precision away from cut
points
Riley, Conrad, Dennis & Bezruczko, 2007
• Used CAT to place persons into low, moderate and high levels of substance abuse and dependency.
• Substance Problem Scale (SPS) is a 16 item instrument measuring recency of substance use. •When was the last time you drank
alcohol?
Defining Cut Points
• Cut points can be established by examining where persons with different levels of severity fall onto the measurement continuum.
The Start Rules
•Random: Randomly select an item between -0.5 and 0.5 logits of severity.
•Screener: Select most informative item relative to measure on a previously administered screener (SDScr).
The Variable Stop Rule
• Stop rules set for low, mid and high range of severity.
• Mid range stop rule was set to SE=0.35 for all simulations.
• Low and High range stop rule: SE=0.5 to 0.75
CAT Standard Error
Middle range where
decisions and made and
precision is controlled
High & Low ranges where there is little impact on clinical
decisions and precision is
allowed to vary more
High & Low ranges where there is little impact on clinical
decisions and precision is
allowed to vary more
Start Rule UsingScreener
Select item
Administeritem
EstimateMeasure, SE
Stop?
End test
Yes
No
High range?
Mid range?
Low rangestop rule
High rangestop rule
Mid rangestop rule
Yes
Yes
No
No
CAT Algorithm
Results
• Screener starting rule improved CAT efficiency by 7 percent
• CAT reduced the number of required items by 13 to 66%
• CAT to full-measure correlations ranged from .87 to .99
• Classification of persons into treatment groups based on CAT and full measure (kappa coefficients) ranged from .66 to .71.
Results• Variable stop rules improved
efficiency by 15-38%•Efficiency depended on definition of
the mid range of severity
• Screener start rule and variable stop rules resulted in accurate and efficient estimation of substance abuse severity.
Measuring Multiple Dimensions
Assessment on Multiple Dimensions
• Instruments often measure multiple constructs
• In CAT, treating a multidimensional item bank as unidimensional is problematic:• Some subdimensions may not be adequately
measured• Particularly if subdimensions are not highly
correlated with each other
Strategy: Content Balancing
• Set an item “quota” for each subscale•Maximum number of subscale items
to administer during the CAT
• An item is selected if:• Its subscale quota has not been met•Provides maximum information
Internal Mental Distress Scale
• The IMDS consists of the following subscales:•Depression Symptom Scale•Anxiety/Fear Symptom Scale•Traumatic Distress Scale•Homicidal/Suicidal Scale
Variations of Content Balancing
• Screener: Administers screener items first; no further content balancing.
• Mixed: Administers screener items, then uses content balancing for remaining items.
• Full: Uses content balancing throughout CAT session.
Variations of Content Balancing
• In mixed and full content balancing, the following target number of items is administered from the IMDS subscales:• Depression: 5• Anxiety: 5• Trauma: 5• Homicidal/Suicidal: 3
Content Balancing ResultsScale N Items None Screener Mixed Full
Depression≥ 1 99.1% 100% 100% 100%≥ 3 79.1% 76.7% 100% 100%
Homicidal/Suicidal
≥ 1 20.5% 100% 100% 100%≥ 3 8.2% 7.8% 100% 100%
Anxiety≥ 1 100% 100% 100% 100%≥ 3 100% 100% 100% 100%
Trauma≥ 1 100% 100% 100% 100%≥ 3 99.7% 100% 100% 100%
CAT to Full-Scale CorrelationsScale None Screener Mixed Full
IMDS 0.982 0.982 0.978 0.971
Depression 0.957 0.937 0.956 0.956
Homicidal/Suicidal
0.599 0.828 0.964 0.945
Anxiety 0.962 0.947 0.956 0.957
Trauma 0.968 0.974 0.972 0.969
Average r 0.894 0.934 0.965 0.960
Placement into Triage GroupsMeasure None Screener Mixed Full
IMDS .867 .871 .863 .841
Depression .909 .911 .753 .749
Homicidal/Suicidal
.312 .067 .917 .902
Anxiety .803 .759 .811 .790
Trauma .836 .850 .847 .837
Average Kappa
.745 .692 .838 .824
Results
• Content balancing had the greatest impact on homicidal/suicidal scale.
• Mixed content balancing provided best overall results
Identifying Persons with Atypical Presentation of Symptoms
Implications
• Implications: Clients sometimes endorse severe clinical symptoms that are not reflected by overall scores on standard assessments.
• Misfit in clinical assessment can reflect:• Difficulty understanding the assessment• Cross-cultural effects• Differential effects of treatment on some symptoms but
not others• Unusual symptom profiles
Clinical Implications• Results reveal subgroups who endorse severe
symptoms without endorsement of milder symptoms.• Atypical Suicide profile• Substance dependence symptoms with abuse
symptoms• Persons who commit serious crimes (murder, rape)
who have not committed less serious criminal offenses.
Person Fit Statistics
• Person fit statistics are the most common means of detecting atypical responders.
• Here is a typical (predicted by IRT) pattern of responding:
11111101000000000• Here is an example of an atypical response
pattern:
110111110100000111
Fit Statistics in CAT• Become less sensitive as the number of
administered items decreases.
• In CAT, items are usually selected in which each possible response to the item is equally likely.
• Items for which unusual responses are given may not be administered by the CAT.
Outfit by Number of Items
Admin. Items
Outfit Categories
< 0.75
Proto
Typical
0.75-1.33
Typical
> 1.33
Atypical
16 30.2% 48.1% 21.7%
12 34.3% 51.1% 14.6%
8 38.4% 53.2% 8.4%
4 58.2% 40.0% 1.8%
Strategies• Item selection strategies
• Unidimensional Approach• Examine response patterns for items representing a second-
order construct, such as internal mental distress• Fit statistics: detects all atypical symptom patterns
• Multidimensional Approach• Compare subdimension measures • Detection of a specific response pattern
• Is the persons level of suicide ideation greater than their level of depression?
• How big a difference in measures?
• Combination of the above
Does Item Selection Matter?Atypicalness
Category None Screener Mixed Full
Full
IMDS
Proto Typical 26.7% 34.6% 48.3% 50.5% 49.2%
Typical 69.0% 58.7% 40.8% 38.9% 38.4%
Atypical 4.3% 6.5% 10.9% 10.6% 12.4%
Kappa .27 .32 .48 .50 --
CAT to Full-Measure Person Fit
CAT* Statistic
Full-Measure
OutfitOutfit r= .73
Ei r=.31
Homicidal/Suicidal – Depression
r=.08
Logistic Regression Correct % 91.6%
* Using full content balancing
Suicide-Depression Profile
CAT* Statistic
Full Measure
H/Sa- Depression
Outfit r = .11
Ei r = -.54
H/S-Depression r = .92
Multiple Regression R2 = .86
* Using full content balancing
a Homicidal-Suicidal Scale measure
Conclusions
• Fit statistics and examination of subscale scores appear to capture different response patterns.
• Using effective item selection methods in conjunction with multiple measures of person fit improves our ability to detect atypical symptom patterns.
Potential of CAT in Clinical Practice
• Reduce respondent burden
• Reduce staff resources
• Reduce data fragmentation
• Streamline complex assessment procedures
• Assist in clinical decision making
• Identify persons with atypical profiles
Contact Information
• A copy of this presentation will be at: www.chestnut.org/li/posters
• For information on this method and a paper on it, please contact Barth Riley at [email protected]