computerized adaptive testing in clinical substance abuse practice: issues and strategies barth...

Computerized Adaptive Testing in Clinical Substance Abuse Practice: Issues and Strategies

Barth RileyLighthouse Institute, Chestnut Health Systems

Overview

• CAT Basics• CAT in Clinical Assessment

• Triage of individuals to support clinical decision making• Measuring Multiple Dimensions• Identifying Persons with Atypical Presentation of Symptoms

Evidence-Based Practice• Requires accurate diagnosis, treatment placement, and

outcomes monitoring• Assessment over a wide range of domains• The cost of evidence-based assessment is:

• Time• Respondent Burden• Increased staff resources (including training

Improving Efficiency• The use of screeners and short-form

instruments has significantly improved the efficiency of the assessment process• Can help determine whether a full

assessment is warranted• But not a substitute for a full assessment

• Lack of precision• Floor and ceiling effects• Limited content validity

CAT Basics

CAT Process

Decreased Difficulty

Decreased Difficulty

Typical Pattern of Responses

Increased Difficulty

Increased Difficulty

Middle Difficulty

Middle Difficulty

Score is calculated and the next best item is selected based on item difficulty+

/- 1

Std

. E

rror

Correct Incorrect

Logical Components of CAT

• Start Rule

• Item Selection

• Measure Estimation

• Stop Rule(s)

The Start Rule

• Used to select first item

• What measure is assigned to the respondent prior to selecting the first item?

• Can be an arbitrary value (0 on the logit scale) or can be based on previously gathered information.

Item Information

0

0.5

1

-4-3

.6-3

.1-2

.7-2

.3-1

.9-1

.5-1

.1-0

.7-0

.3 0.1

0.5

0.9

1.3

1.7

2.1

2.5

2.9

3.3

3.7

Trait

Pro

ba

bili

ty/In

form

ati

on

Prob. Information

Item Difficulty = 0.5

too difficult too easy

Maximum information, Trait level = 0.5

CAT in Clinical Assessment

Clinical Decision Making

• How severe are the symptoms?

• What type of treatment is most appropriate?

• Can CAT be used to answer these questions more efficiently?

Strategy

• Starting Rules•Using screener measures to set

the initial measure and select the first item

• Variable Stop Rules•Tight precision around cut points•Less precision away from cut

points

Riley, Conrad, Dennis & Bezruczko, 2007

• Used CAT to place persons into low, moderate and high levels of substance abuse and dependency.

• Substance Problem Scale (SPS) is a 16 item instrument measuring recency of substance use. •When was the last time you drank

alcohol?

Defining Cut Points

• Cut points can be established by examining where persons with different levels of severity fall onto the measurement continuum.

The Start Rules

•Random: Randomly select an item between -0.5 and 0.5 logits of severity.

•Screener: Select most informative item relative to measure on a previously administered screener (SDScr).

The Variable Stop Rule

• Stop rules set for low, mid and high range of severity.

• Mid range stop rule was set to SE=0.35 for all simulations.

• Low and High range stop rule: SE=0.5 to 0.75

CAT Standard Error

Middle range where

decisions and made and

precision is controlled

High & Low ranges where there is little impact on clinical

decisions and precision is

allowed to vary more

High & Low ranges where there is little impact on clinical

decisions and precision is

allowed to vary more

Start Rule UsingScreener

Select item

Administeritem

EstimateMeasure, SE

Stop?

End test

Yes

No

High range?

Mid range?

Low rangestop rule

High rangestop rule

Mid rangestop rule

Yes

Yes

No

No

CAT Algorithm

Results

• Screener starting rule improved CAT efficiency by 7 percent

• CAT reduced the number of required items by 13 to 66%

• CAT to full-measure correlations ranged from .87 to .99

• Classification of persons into treatment groups based on CAT and full measure (kappa coefficients) ranged from .66 to .71.

Results• Variable stop rules improved

efficiency by 15-38%•Efficiency depended on definition of

the mid range of severity

• Screener start rule and variable stop rules resulted in accurate and efficient estimation of substance abuse severity.

Measuring Multiple Dimensions

Assessment on Multiple Dimensions

• Instruments often measure multiple constructs

• In CAT, treating a multidimensional item bank as unidimensional is problematic:• Some subdimensions may not be adequately

measured• Particularly if subdimensions are not highly

correlated with each other

Strategy: Content Balancing

• Set an item “quota” for each subscale•Maximum number of subscale items

to administer during the CAT

• An item is selected if:• Its subscale quota has not been met•Provides maximum information

Internal Mental Distress Scale

• The IMDS consists of the following subscales:•Depression Symptom Scale•Anxiety/Fear Symptom Scale•Traumatic Distress Scale•Homicidal/Suicidal Scale

Variations of Content Balancing

• Screener: Administers screener items first; no further content balancing.

• Mixed: Administers screener items, then uses content balancing for remaining items.

• Full: Uses content balancing throughout CAT session.

Variations of Content Balancing

• In mixed and full content balancing, the following target number of items is administered from the IMDS subscales:• Depression: 5• Anxiety: 5• Trauma: 5• Homicidal/Suicidal: 3

Content Balancing ResultsScale N Items None Screener Mixed Full

Depression≥ 1 99.1% 100% 100% 100%≥ 3 79.1% 76.7% 100% 100%

Homicidal/Suicidal

≥ 1 20.5% 100% 100% 100%≥ 3 8.2% 7.8% 100% 100%

Anxiety≥ 1 100% 100% 100% 100%≥ 3 100% 100% 100% 100%

Trauma≥ 1 100% 100% 100% 100%≥ 3 99.7% 100% 100% 100%

CAT to Full-Scale CorrelationsScale None Screener Mixed Full

IMDS 0.982 0.982 0.978 0.971

Depression 0.957 0.937 0.956 0.956

Homicidal/Suicidal

0.599 0.828 0.964 0.945

Anxiety 0.962 0.947 0.956 0.957

Trauma 0.968 0.974 0.972 0.969

Average r 0.894 0.934 0.965 0.960

Placement into Triage GroupsMeasure None Screener Mixed Full

IMDS .867 .871 .863 .841

Depression .909 .911 .753 .749

Homicidal/Suicidal

.312 .067 .917 .902

Anxiety .803 .759 .811 .790

Trauma .836 .850 .847 .837

Average Kappa

.745 .692 .838 .824

Results

• Content balancing had the greatest impact on homicidal/suicidal scale.

• Mixed content balancing provided best overall results

Identifying Persons with Atypical Presentation of Symptoms

Implications

• Implications: Clients sometimes endorse severe clinical symptoms that are not reflected by overall scores on standard assessments.

• Misfit in clinical assessment can reflect:• Difficulty understanding the assessment• Cross-cultural effects• Differential effects of treatment on some symptoms but

not others• Unusual symptom profiles

Clinical Implications• Results reveal subgroups who endorse severe

symptoms without endorsement of milder symptoms.• Atypical Suicide profile• Substance dependence symptoms with abuse

symptoms• Persons who commit serious crimes (murder, rape)

who have not committed less serious criminal offenses.

Person Fit Statistics

• Person fit statistics are the most common means of detecting atypical responders.

• Here is a typical (predicted by IRT) pattern of responding:

11111101000000000• Here is an example of an atypical response

pattern:

110111110100000111

Fit Statistics in CAT• Become less sensitive as the number of

administered items decreases.

• In CAT, items are usually selected in which each possible response to the item is equally likely.

• Items for which unusual responses are given may not be administered by the CAT.

Outfit by Number of Items

Admin. Items

Outfit Categories

< 0.75

Proto

Typical

0.75-1.33

Typical

> 1.33

Atypical

16 30.2% 48.1% 21.7%

12 34.3% 51.1% 14.6%

8 38.4% 53.2% 8.4%

4 58.2% 40.0% 1.8%

Strategies• Item selection strategies

• Unidimensional Approach• Examine response patterns for items representing a second-

order construct, such as internal mental distress• Fit statistics: detects all atypical symptom patterns

• Multidimensional Approach• Compare subdimension measures • Detection of a specific response pattern

• Is the persons level of suicide ideation greater than their level of depression?

• How big a difference in measures?

• Combination of the above

Does Item Selection Matter?Atypicalness

Category None Screener Mixed Full

Full

IMDS

Proto Typical 26.7% 34.6% 48.3% 50.5% 49.2%

Typical 69.0% 58.7% 40.8% 38.9% 38.4%

Atypical 4.3% 6.5% 10.9% 10.6% 12.4%

Kappa .27 .32 .48 .50 --

CAT to Full-Measure Person Fit

CAT* Statistic

Full-Measure

OutfitOutfit r= .73

Ei r=.31

Homicidal/Suicidal – Depression

r=.08

Logistic Regression Correct % 91.6%

* Using full content balancing

Suicide-Depression Profile

CAT* Statistic

Full Measure

H/Sa- Depression

Outfit r = .11

Ei r = -.54

H/S-Depression r = .92

Multiple Regression R2 = .86

* Using full content balancing

a Homicidal-Suicidal Scale measure

Conclusions

• Fit statistics and examination of subscale scores appear to capture different response patterns.

• Using effective item selection methods in conjunction with multiple measures of person fit improves our ability to detect atypical symptom patterns.

Potential of CAT in Clinical Practice

• Reduce respondent burden

• Reduce staff resources

• Reduce data fragmentation

• Streamline complex assessment procedures

• Assist in clinical decision making

• Identify persons with atypical profiles

Contact Information

• A copy of this presentation will be at: www.chestnut.org/li/posters

• For information on this method and a paper on it, please contact Barth Riley at [email protected]

computerized adaptive testing in clinical substance abuse practice: issues and strategies barth...

Documents