1 research methods fall 2011. 2 science helps avoid bias biases confound our judgment overconfidence...
TRANSCRIPT
1
Research Methods
Fall 2011
2
Science Helps Avoid Bias
Biases confound our judgment
• Overconfidence
• Confirmation bias
• Self-fulfilling prophecies
• Belief perseverance
• Illusory correlations
• Availability heuristic
• Conjunction fallacy
• Seeing patterns when there are none
3
1. Our clinical judgment is biased– biases in perception and interpretation are
pervasive (e.g., blinding, grading, replication effect sizes lower)
– when figuring out causes of problems– when figuring out what is effective for individuals– change occurs for many reasons so difficult to
detect specific causes with the naked eye– hard to see subtle, delayed, or slow change
resulting from specific causes– “common sense” is often wrong– harmful treatments can seem helpful
Is a Research Class Necessary?
4
Early in therapy, my BPD client got worse.• Caused by therapy?• What could be other causes?
After 8 months, her behavior improved.• …soon after her parents threatened to stop
paying for therapy• What could be other causes?
BPD Case Study
5
2. Lists of ESTs is overly simplistic– research does tell us what is effective NOW
for my particular client in this particular setting– largely ignores moderators– minimally addresses principles of change
(mediators)– too many treatments on the list
Is a Research Class Necessary?
6
Numerous therapists practice unvalidated and sometimes discredited methods
• Astrology, Tarot cards, palm reading• Homeopathic remedies• Primal Scream Therapy• Sensory deprivation therapy• Rebirthing Therapy• Thought Field Therapy• Facilitated Communication (autism)• EMDR (SDPA article)
Pseudo-Science is Easy to Believe
7
Science provides a systematic and (relatively) reliable approach to figuring out causes of important problems and change:
Why Do I Care about Research?
8
Guidelines for physical health:• Exercise, omega-3, anti-oxidants, vitamin D3• Red wine and alcohol healthy• Flossing could make you live longer• Breast cancer - soy and estrogen replacement• Amalgam fillings• Low calorie diets• Cholesterol
Would you get surgery from a doctor whose practice wasn’t based on reliable evidence?
Why Do I Care about Research?
9
1. Dissertation (quicker, easier)
2. Research is fun!3. Better understand disorders (causes)4. Prediction of clinical outcomes5. Improve effectiveness with clients
– evidence-based practice– what needs to change (causes)– how to change it (causes)
6. Publish (to get internships & postdocs)
Why a Research Class?
10
1. Cannot please everyone!1. Testing is a drag!
2. Try to balance pace
2. Some students feel overwhelmed whereas others have said they have learned nothing new
1. need solid grasp of stats
3. Labeling/terminology is important
This Class is Demanding!
11
Plaque on teeth correlated with plaque in arteries, therefore flossing could make you live longer (on TV news show)
how to choose a topic - bring in an article
the ideal experiment - a time machine
RG mom said she was been worse since start of therapy
RG better because of coercion
"low calorie" foods can lead to weight gain (like a placebo effect)
Garlic causes insomnia
12
15
16
17
It is problematic to ignore research
Research findings can easily bemisinterpreted or misused
(be careful relying on expertsor over-valuing statistical significance)
Not all research is created equal
18
1. Many sources of bias (many subtle)
2. Confounds make results ambiguous
3. Results do not generalize
Research Findings Can Easily Be Misinterpreted
19
Ways to Determine What Works
• Clinical observation and intuition• Treatment research can reduce bias and
ambiguity
20
Three basic designs
• Observational/correlational studies
• Non-randomized manipulations
• Randomized experiments
Correlational studies never randomize!!
Not all Research Designs are Equally Persuasive
21
Too many causes to untangle
Hard to isolate a specific cause
(poor internal validity)
Non-randomized Studies Often Yield Ambiguous Answers
Regarding Cause and Effect
22
1. Have a clear causal theory (if possible)
2. Causes must precede effects
“cause” = independent variable (IV)
“effect” = dependent variable (DV)
OR
the IV precedes (and predicts) the DV
Studying Cause and Effect
23
Internal validity improves when you rule out confounds; for example, you improve internal validity when:
– you include gender as covariate– you exclude men– you match non-randomized groups on gender– the supposed cause precedes and correlates
with the effect
Internal Validity ofNon-Randomized Studies
24
1. Cross-sectional (one time data collection)– correlations among current events,
experiences, behaviors, and constructs– retrospective: some measures rely on memory
for prior events, behaviors, etc.
2. Prospective (longitudinal)
Correlational Studies
25
PUT IN SLIDE ON PROSPECTIVE STUDY OF SHAME PREDICTING SUBSEQUENT SELF-INJURY, WHILE COVARYING BASELINE SELF-INJURY
Prospective Correlational Studies
26
Percent Eventual Suicide of Persons at High Risk for Suicide Who Obtain Treatment vs. Refuse Treatment
Motto (1976)
27
Percent Suicide for Contacted vs. Non-Contacted High Suicide Risk Persons
Who Refuse Further Treatment
Motto (1976)*=p<.05*=p<.05
28
If you believe that non-randomized studies are sufficient to evaluate
treatment efficacy, then you have to admit that treatment-as-usual increases suicide among high risk individuals. If
you don’t want to make that conclusion, then you need experimental research.
29
• suicide treatment study
• estrogen therapy study
• critical incident stress debriefing
• cholesterol studies yields consistent findings
Non-randomized Studies Often Yield Different Findings than RCTs
30
What conclusions can be made with what degree of confidence?
1. Is the IV really the IV? DV?2. Is I.V. really a true cause of D.V.? (internal validity)
– Alternative interpretations of findings?– Does the intervention work?
3. Why did the “cause” lead to the effect?– How does the I.V. cause the D.V.?– Why does intervention work (mediators)
4. For whom are the causes truly causes (what populations and settings)?
Research Validity
31
Are the results “confused”?
Is the I.V. confounded with another variable, and could this third variable be the main cause of the I.V. and the D.V.?
Is the IV-DV relationship spurious?
Does the I.V. cause the D.V. for the specific hypothesized reasons or are supposedly non-essential parts the primary causes?
Confound = Confuse
32
Tell your Grandma!
1. State the IV-DV relationship and why you think the relationship exists
2. Identify a confound• However, the IV-DV relationship “could be due to ___”
2a. State the IV-confound relationship
2b. State the confound-DV relationship
2c. Therefore, the IV-DV relationship may simply be due to the confound.
What are Confounds?
33
Example: Why is hot weather associated with ice cream sales? Causal link or spurious correlation?
Example: If darker-skinned people commit more crimes, what could be the reason? Causal link or spurious correlation?
Example: Why did CBT group end up with less depression than group who got supportive counseling? Causal link? Or is outcome difference due to some other difference between the groups?
Example: Why do hairier players score more goals? Causal link or spurious correlation?
Confounds
34
Mediation
7. Treatment leads to changes in outcome (direct effect)
8. The direct effect for treatment diminishes (in magnitude and significance) when the mediator is entered in the analysis. Treatment is effective because it changes ME.
ME = stressful events or negative cognitions
35
Time confounds
Maturation confound
Cognitive training to young people
Cognitive training to old people– natural cognitive decline could mask benefit
Internal Validity
36
Hairiness is associated with scoring more goals on the mixed-gender soccer team because having more hairs keeps muscles warmer.However, that may be due gender, strength, and speed.Men are more often stronger and faster players.Stronger and faster players can score more.Therefore, hairiness per se may not cause the more effective performance.
psych medications is likely to be an important internal validity confound in a study of bipolar disorder. If the control group does not have equal amounts and types of meds then difference between bipolar participants and participants in the control group could simply be caused by the difference in medications. Bipolar participants may have worse memory simply because they have more toxic medications in their body that cause memory impairment because they cause brain atrophy or heavy sedation. We have to acknowledge this possibility and hopefully we can rule it out by controlling for these variables.
Homework #1
37
TWO MOODLE SUBMISSIONS ARE REQUIRED for this assignment.http://www.dbtsandiego.com/form11
1) Identify one specific plausible mediator. Describe the mediator clearly and completely.2) Identify one specific plausible threat to internal validity? Describe the confound clearly and completely, including the direction of the effect.3) Identify one specific plausible moderator of the relationship between the independent and dependent variables, and explain the moderation effect clearly and completely, including the direction of the associations.
Study 1: Consider a single-group correlational study to assess the effects of level of childhood sexual abuse on adult interpersonal violent behavior, both measured as continuous variables. All participants had at least some history of abuse during childhood, and level of abuse was a combination of frequency and severity of prior abuse episodes, ranging from a single instance of an older child touching the participant's genitals over clothing, up to multiple rapes involving intercourse with an adult stranger and threats of violence. Level of violence was a combination of frequency and severity, ranging from a single instances of verbal cruelty or destroying the property of others, up to multiple physical assaults or murder.
Study 2: Consider a two-group study designed to test the hypothesis that having a history of childhood sexual abuse (binary independent variable: none versus any) increases risk of physically assaulting another person (binary dependent variable: never versus at least one). The percentage of adult participants who have physically assaulted another person at least once will be compared between the abused and non-abused groups.
Homework #1
38
Confounds vs. Mediators
• The label depends on your theory of the causal process
– intrinsic or extrinsic part of your I.V.?
• If there is no main effect of the IV.– mediators cannot be examined (overall)– there can be moderator effects
• crossing regression lines• mediators can be examined for subgroups
39
Internal Validity:• Is the I.V. really a true cause of the D.V.?
Construct Validity:• Why did the “cause” lead to the effect?
Depends on your theory• Is explanatory variable an intrinsic part of I.V.?
Internal vs. Construct Validity
40
Construct Validity
Finding out:
• Why does a I.V. lead to a D.V.?
• Why does a manipulation cause a change?
• Why is an intervention effective?
41
Two Types of Construct Validity
• Construct validity of IV– some intrinsic part of the IV accounts for the
IV-DV correlation or group difference
• Causal sequence (mediation)– the IV => mediator => DV– the mediator occurs after the IV– the mediator occurs before the DV
42
Construct Validity
Examples:
• Ice Cream
43
Mediation
7. Treatment leads to changes in outcome (direct effect)
8. The direct effect for treatment diminishes (in magnitude and significance) when the mediator is entered in the analysis. Treatment is effective because it changes ME.
ME = stressful events or negative cognitions
44
First state your specific theory of why the IV is related to the DV (intermediate cause)
IV => M => DV
Mediation requires:1. IV is correlated with mediator
2. mediator is correlated with DV
3. The IV-DV correlation is reduced when the mediator is added into the prediction equation
4. A mediation test shows the IV-DV correlation reduction is statistically significant
Mediation
45
Correlational mediation tests do not prove causal direction
All pathways could be true:
IV => M => DV
M => IV => DV
DV => M => IV
DV => IV => M
Mediation
46
Construct Validity Confounds
1. State your specific theory of why the IV is related to the DV (mediator)
2. Identify a competing theory of the IV-DV relationship (C.V. confound), e.g., how the relationship could be due to other intrinsic parts of the IV or DV (which may be more generic or broader characteristics of the IV or the DV.
3. Identify variables that share the confounds– include the variable(s) as covariate(s)– have those characteristics in control group
47
Validity vs. Confounds
48
Confound Examples
• Do BPD patients have more shame than normal controls?
• Does shame lead to self-injury?• Does CBT reduce depression (compared to
wait list control condition)?
• Does HRV biofeedback plus exposure lead to more reduction in fear than exposure alone?
What are the IV and DV construct validity confounds?
49
Confound Examples
Independent variables:
• Chinese (human, race, culture, Asian)• shame (emotion, depression, anxiety, guilt)
• BPD (disorder, personality disorder, history of abuse, suicidality)
• CBT (support, hope, commitment, payment, problem-solving, new thinking, beh. activation)
50
Designing a Control Group
1. Figure out all the variables (characteristics or experiences) that are likely to influence your dependent variable other than your independent variable (confounds)
2. Include most important confounds in the control group
3. Figure out all the components of your independent variable (population, disorder, manipulation, treatment, etc.)
4. Decide what part of the independent variable you want to evaluate (broad or specific?) and include all other part(s) of the I.V. in the control group
51
Example: If people who get CBT truly get improved outcomes and placebo effects fully explain why, is that evidence that “CBT” works?
Example: If people who get CBT truly get improved outcomes and number of stressful events during therapy fully explains why, is that evidence that “CBT” works?
Internal vs. Construct Validity
52
Example: If people with BPD have worse suicidal behavior than normals and poor coping fully explains why, is that evidence that “BPD” per se is the cause?
Example: If people with BPD have worse suicidal behavior than normals and number of stressful events during therapy fully explains why, is that evidence that “BPD” per se is the cause?
Example: If people with BPD have worse suicidal behavior than normals and psych meds fully explains why, is that evidence that “BPD” per se is the cause?
Internal vs. Construct Validity
53
Merge this with other slides
• Use a control group that has the characteristics that you want to rule out
• Adjust for independent variable confounds with covariates
– Ex: Tangney "shame-free guilt" and "guilt-free shame"
• Include a DV that measures some non-essential aspect of the primary DV to show there is a stronger IV-DV association for the primary DV
54
Moderators
Moderators are third variables that moderate or change the magnitude
or direction of the relationship between two other variables
55
Moderation
5. Treatment is less effective for [men].
6. Coping skills reduce the impact of stressors
6. Mindfulness skills reduce the impact of negative thoughts
ME = stressful events or negative cognitions
56
Confounds vs. Moderators
• Internal validity confounds and moderators are separate issues
• A variable can be a confound and a moderator, but still are separate issues
– cannot enter as a covariate
• If there is no main effect of the IV.– mediators cannot be examined (overall)– there can be moderator effects
• crossing regression lines• mediators can be examined for subgroups
57
Moderator Examples• Resiliency (reduce causality)
– Does abuse cause violence?
• Risk factors (increase causality)– Does smoking cause cancer?
• Do opiates cause euphoria? (Naltrexone)• Does alcohol cause euphoria? Antabuse• MBCT to prevent depression relapse• BA best for severe depression• Psychotropic meds depend on race (Kazdin)
• Acculturation
Mindfulness-Based CT
Study 1
# epis. MBCT TAU
1-2 54% 31%
>2 37% 66%
Study 2
MBCT TAU
1-2 50% 20%
>2 36% 78%
59
Moderator Example
60
Depends on your question:
• Why? – construct validity
• Generalize? – external validity
Kazdin, pg 63
External vs. Construct Validity
61
Passage of time
Testing – desensitizing to shame questions
Testing – diary cards
Instrumentation
– July 4th arrests, suicides
– what is “good” or “bad” to raters is relative to what has already been seen
Confound Examples
62
Sources of Bias
• Hawthorne effects
• Demand effects
• Placebo effects– active placebo– placebo surgery
• Interpersonal expectancy effects (Rosenthal)
63
Participant Reactivity (Bias)
A = Hawthorne; B = Demand effects
C = Expectancy/Placebo; D = Rosenthal
64
Limiting Bias
• Forms of bias– Hawthorne effects– expectancy effects (e.g., placebo)– demand effects
• Bias protection– naive experimenters (therapists vs. assessors)
• no knowledge of experimental condition• no knowledge of hypotheses
– automated and standardized procedures
65
Reduce Bias• Rosenthal Effect
– naïve experimenters and interviewers• check knowledge of subjects and hypotheses
– scripted procedures (standardized)• written or recorded instructions
– balance bias: therapist allegiance in RCT– balance bias: plausible rationale for control group– measure expectancies
• Demand effects– withhold hypotheses
• check subjects’ beliefs about purpose of study (debriefing)
– have indirect measures (not self-report)
66
Example: Mediators and Moderators
Rosenthal Effect among Teachers
Mediator:• students persist more• teachers provide more help (persist)• teachers more effectively reinforce students• teachers’ biased evaluations of students
Moderator:• student-teacher similarity• student attractiveness and kindness• teacher burnout
67
Example: Mediators and Moderators
Rosenthal Effect among Researchers
Mediator:• subjects/patients persist more• experimenter/therapist provides more prompting• experimenter/therapist reinforce target behavior• experimenter biased evaluations of subjects
Moderator:• subject-experimenter similarity (patient-therapist)• patients’ fear of failure/success
68
1. ____ is a serious problem
2. Currently we do not adequately understand the problem (insufficient data)
3. Current treatments are insufficient because:
4. It is plausible that a missing piece is…
5. No study has yet tested…
6. This study is needed to address…
Study Justification
69
1. Do not state null hypotheses. Instead:– show differences in correlations– test moderator effects– specify effect size and/or confidence intervals
2. State basic (non-jargon) idea/theory
3. Operationally define how test theory (measures)
4. State direction of effects clearly
5. Not too many, not too few
Research Hypotheses
70
Dealing with I.V. Confounds• Measure all plausible internal validity confounds
and evaluate their role in the results• Prevent confounds if possible
– make subjects homogeneous on confound variable• narrow inclusion criteria
– use a control group that is equivalent on participant baseline characteristics
• randomize (stratified) to groups• match subjects (case-control, quasi-experimental study)• within-subjects design
– standardize/yoke procedures/scripts– naive experimenters/interviewers to reduce bias
• check the blind
• Adjust for confounds with covariates
71
Within-Subjects Designs
Within the same person:• comparing multiple things• comparing the effect of two manipulations
Advantages:• virtually eliminates person variable confounds• increases statistical power• can yield truer correlations
Disadvantages• time confounds• order and carry-over effects (manipulations)
72
Within-Subjects Designs
Multiple-treatment designs
• multiple manipulations
Single-subject designs
• ABABAB
• multiple-baseline
Repeated measures designs
• change over time
• covariation over timeSmall sample diary studies (N < 30) (Caspi (1987)
73
Reasons for Parasuicide
Method of Analysis
Between-Ss Within-Ss
Reason NS SA NS SA
Feeling Generation 54 21** 5915***
Self-punishment 63 38* 59 51
Anger expression 63 24*** 54 28**
74
Correlational Methods
Research Question: What is the association between shame and suicide ideation?
1. Between-subjects: each subject has one shame score and one ideation score
2. Within-subjects: each subject has multiple pairs of shame and ideation scores
– correlations per person (HLM)– small sample diary study (Caspi method)
75
Correlation: shame is correlated with SI• BS: people with higher shame have more SI• WS: when the shame (of individuals) increases
their SI also increases
Experiment: increasing shame increases SI• BS : people who get a shame induction
increase their SI more than people who do not• WS: SI increases more when people get a
shame induction than when they (same individuals) do something else
Shame and Suicide Ideation
76
Within-Subjects Correlations
77
Within-Subjects Correlations: HLM
1. Regression lines for each subject
2. Compute the average regression line
78
Within-Subjects Correlations: HLM
1. One regression line for all subjects
2. Many IV-DV pairs per subject
79
Caspi (1987) Correlation Method
Stress ratings perfectly predict concurrent painr = 1.0
80
Caspi (1987) Correlation Method
Which correlation is larger?
81
Caspi (1987) Correlation Method
1. Collect many frequent measures of IVs and DVs– e.g., daily scores for at least several weeks
2. Data for all subjects in one regression equation
3. Remove between-subjects effect:– dummy code each participant and enter all dummy
coded variables into regression
4. Test if today’s IV score predicts tomorrow’s DV score better than it predicts yesterday’s DV
5. Test if today’s IV score predicts tomorrow’s DV score when covarying today’s DV score
82
Randomized Experiments
Step 1: Select an intervention or manipulation that simulates a cause in the natural world (independent variable)
Step 2: Select a randomization method and verify that groups are comparable on confounds
Step 3: Verify that the intended independent variable actually occurred sufficiently (manipulation check or treatment adherence)
83
Randomized ExperimentsAnalog studies = simulations
Independent VariablesEmotions mood inductionSocial exclusion computer simulationAttribution bias ambiguous aggression scenariosJury decisionsvignettes of criminal trialsMalingering instructions to fake malingeringSuppression suppression instructionsWorry worry instructionsSelf-Injury enduring cold-pressor pain
84
Randomized ExperimentsAnalog studies = simulations
Dependent Variables (samples of behavior)Aggression electric shock (self-harm studies too)
Aggression point subtraction penaltiesSelf-harm cold pressor task, electric shockStigma attitude electric shock (to patients)
Persistence time with unsolvable anagramsImpulsivity gambling gamesBinge eating snack food left in room
85
Dealing with I.V. Confounds• Measure all plausible internal validity confounds
and evaluate their role in the results• Prevent confounds if possible
– make subjects homogeneous on confound variable• narrow inclusion criteria
– use a control group that is equivalent on participant baseline characteristics
• randomize (stratified) to groups• match subjects (case-control, quasi-experimental study)• within-subjects design
– standardize/yoke procedures/scripts– naive experimenters/interviewers to reduce bias
• check the blind
• Adjust for confounds with covariates
86
1. The subjects differ (selection bias) at different levels of the I.V.
– baseline levels of the D.V. (severity)– demographics (gender, age, ethnicity, SES)– differential drop outs
2. Subjects’ experiences in study differ– demand/expectancy effects
• experimental group more hopeful• control group demoralized or competitive
– amount of treatment received (or practice)
Internal Validity Confounds
87
Randomization Failure
Probability that at least one confound will occur due to chance:
• 22.6% if 5 confounds are tested
• 40.1% if 10 confounds are tested
• 64.1% if 20 confounds are tested
(assumes p<.05 per confound and that all confounds are independent of each other)
88
Must Check ifRandomization Worked!
• Not rare that baseline differences on at least one variable emerge due to chance!
• A stratum (level) may be too big
• Subjects do not fill all strata (levels)– In a DBT study, stratified randomization failed
because only “medium” severity subjects entered study. More severe “medium” severity subjects were in control group.
89
Effect Sizes
Pearson r indicates the magnitude of association between two continuous variables
Cohen’s d indicates the magnitude of association between a binary variable and a continuous variable
90
Effect Sizes
correlation t-testPearson r (r2) Cohen’s d
Small .10 (.01) .20
Medium .25 (.08) .50
Large >.38 >.80
91
Effect Sizes
92
Problems of Multiple Tests: Inflated Type I error rate
If alpha level is set at .05, the chance of finding at least one Type I error is:
• 22.6% if five statistical tests are done
• 40.1% if ten statistical tests are done
• 64.1% if 20 statistical tests are done
93
Test which cars can get you from San Diego to Los Angeles the fasted—red or blue?
Why (mediator)?• (knowledge of) faster route• push gas harder (no fear of police or crash)• car accelerates faster• car has more horse power
When (moderator)?• number of stops (because of acceleration)• number of hills (because of horse power)
Example
94
Nonspecific Predictors of Outcome
95
Passive recruiting (e.g., posting flyers) will result in a very biased sample. People who respond to ads are different than the general public.
Instead go to places in person and approach people.
Go to a mall and give people 5 dollars in advance.
External Validity
96
Problems with Self-Report
measure inequivalence contaminates static group comparison studies
• questions mean different things to different people (concept inequivalence)
– intimacy example NEED BETTER EXAMPLE
• people use number scales differently (metric inequivalence)
– Italians vs. Irish pain ratings– BPD more emotional than APD?
Solutions:• randomization balances out the differences• within-group correlations
97
Best Self-Report Methods
Current observable states– to avoid memory bias– to avoid unnecessary inferences
…measured in relevant contexts to ensure activation of schema
– mood induction– priming procedures– experience sampling in natural contexts
Interviews
98
Are Women More Emotional?
YES, when comparing global retrospective self-reports of women vs. men
NO, when comparing average emotional states measured by experience sampling
Barret et al. (1998). Are women the "more emotional" sex?
99
Number of days to first parasuicide
0 100 200 300 400
0.00
0.25
0.50
0.75
1.00
Low shame
High shame
Survival Plot for Shame
100
Why does trait shame not predict self-injury?
Shame Variability
101
Client #3
0
5
10
15
20
25
Week
SSGS
Sha
me
Baseline
Client #5
0
5
10
15
20
25
Week
SSGS
Sha
me
Baseline
Shame Variability
102
Avoid the Problems ofShared Method Variance
and Socially Desirable Responding
103
Alternatives to Self-ReportYour research proposal should include at least one:• informants (e.g., spouse or significant others)• behavioral samples
– Behavioral Approach/Avoidance Test– observational coding (e.g., FACS)– unobtrusive behaviors (e.g., Bargh studies)– Davison ATSS
• psychophysiology• performance tests (reaction time measures)
– semantic priming– stroop– IAT
104
Alternatives to Self-ReportPASATExclusion computer program
PSAPElectric shockconcentration
105
Implicit Association Test
106
Implicit Association Test
107
Implicit Association Test
108
Implicit Association Test
109
Implicit Association Test
110
Semantic Priming
The driver stepped on the… GAS
The driver stepped on the… BRIDGE
They said it was the… BRIDGE
111
Semantic Priming
He was less stressed when he had the…
BEER
They said it was the… BEER
112
Semantic Priming
1. I deserve…PUNISHMENT
2. I deserve…WATER
3. I deserve…PRAISE
4. A criminal deserves…PUNISHMENT
5. I injure myself for…PUNISHMENT
6. They said it was for…PUNISHMENT
RT: 1<2<3, 1=4<6, 4=5<6
113
Semantic Priming
I injure myself for…RELIEF
I injure myself for…EXPERIENCE
Aspirin can provide…RELIEF
114
Schema Activation
Schema: “Black people are dangerous”
Situation: police officer sees person standing up from behind an object in an alley
Motor response: ??
115
Schema Activation
Schema: “Old people are slow and sickly”
Priming: see “old” words
Motor response: slower walking down hall
Schema: “Interrupting is rude, helping is nice”
Situation: describing a nice friend
Motor response: offer help to someone else
116
Valid Coding/InterviewingTraining• select extra material not to be analyzed• talk through examples• code separately and confirm inter-rater reliability
with Kappa, ICC, or PearsonHave primary rater be naïve to hypotheses/subjectsVerify reliability of coded analysis variables• inter-rater reliability (>20% of data)• intra-rater reliabilityRe-train if necessary• use material that will not be analyzed• do not reveal discrepancies for real data that
must be re-coded
117
Manipulation Checks
Verify that the experimental manipulation worked (as intended) to know mediation
• subjects paid attention and retained important information
• subjects actually had the targeted emotional experience
• subjects complied with instructions
• the intervention was delivered correctly (integrity / adherence / fidelity)
118
Statistical Issues
• Missing data
• Type I versus Type II errors– data snooping (fishing)
• Power analyses
• Maximizing power
119
Type II Errors are Ubiquitous
• Most studies are underpowered to detect anything but large effect sizes
• A statistically non-significant result does not mean no correlation or no difference
– medium-sized effects are often not statistically significant
– most studies cannot detect small effects
120
Ways to Increase Power• increase sample size• increase group differences (effect size)• within-subjects (use both pre- and post-tests, control stimuli)
• increase alpha (e.g., .10% Type I error rate)• one-tailed tests (for a priori hypotheses)• be parsimonious
– in primary hypotheses and analyses– only have two groups
• decrease variability– standardize procedures and use scripts– do not counterbalance unless necessary– homogeneous sample (narrow inclusion)– use reliable measures– clean your data
121
ControversyPosition 1: Dodo Bird (e.g., Wampold)
• all therapy benefit due to common factors
• specific techniques make no difference
Position 2: RCTs are irrelevant to real world• Poor external validity (too many exclusions)
Position 3: RCT methods do not identify best treatments (because they compare to TAU or waitlist). Other research strategies better for figuring out active treatment ingredients (Sprenkle, Davis, & Lebow)
122
Controversy: The Dodo Bird
In a diverse literature it is vital to consider these specific issues for specific studies
1. Credibility of the results (or opinion)
2. What specific problem?
3. What specific treatments?
123
Controversy: The Dodo Bird
All therapy benefit due to common factors and specific techniques make no difference.
• Based on metaanalyses that inappropriately average across studies with diverse disorders, treatment comparisons, and methodologies.
• Analogy: Ask all San Diegans “Do pills effectively treat sore throat, cough, & nasal congestion?"
124
Goals of Treatment Research
To find out most effective treatments:
• The treatment DID cause change (efficacy)
• The treatment causes change (specificity)
• The treatment causes meaningful and lasting change
• The treatment works in the real world– for whom?
• The reasons why treatment works
125
Treatment Research
efficacious – when a treatment is better than no treatment or is comparable to another treatment with established efficacy
specific – when a treatment is better than a placebo control condition or another credible treatment
effective – when a treatment is shown improve outcomes in real-world settings
126
NIMH Stage Model1. treatment development, single-subject and
single-group designs, predictors of treatment success, small pilot RCTs
2. RCTs with sufficient power showing– 2a) efficacy (high internal validity)– 2b) specificity (more than generic therapy)
3. RCTs with high external validity (may lose some internal validity)
- may be quasi-experimental studies
4. Mechanisms and mediators
127
Kazdin Stage Model
1. treatment development and small pilot RCTs
2. RCTs with sufficient power and mediation correlational analyses
– 2a) efficacy (high internal validity)– 2b) specificity (more than generic therapy)– 2c) component analysis studies
3. RCTs with high external validity
128
Levels of Empirical Support
Level 5: Efficacious and Specific
Level 4: Efficacious
Level 3: Probably Efficacious
Level 2: Possibly Efficacious
Level 1: Not empirically supported
129
Effectiveness
When treatments are shown to be:
1. effective for common patient populations– high compliance
2. effective when delivered by common therapists in common settings
– high acceptability and compliance– easy to disseminate and train
3. cost-effective
130
What Does Work and Why?External Validity
EXTERNAL VALIDITY• True experiments with high variability and
common people (few exclusions)• True experiments in common settings with
common therapists• True experiments with various subgroups
of patients• Quasi-experimental designs
131
What Does Work and Why?Construct Validity
CONSTRUCT VALIDITY• Rigorous control groups
– rule out that therapy was received– rule out amount of therapy received– rule out placebo/expectancy/demand effects– rule out therapist characteristics/differences– do manipulation checks
• Dismantling studies– identify active treatment components– (experimental) psychopathology research
132
Treatment Research:Gold Standard
Randomized Controlled Trial
133
Two Uses of Control Groups
• Internal validity– have equivalent groups treated equally
EXCEPT the intervention
• Construct validity– have equivalent groups treated equally
EXCEPT the most important active ingredients of an intervention
134
Control Group Hierarchy
Control group is/includes:• components of active treatment (e.g.,
behavioral activation but not cognitive restructuring)
• comparable morale/confidence/allegiance• comparable "quality" of treatment (e.g.,
experience/expertise)• attention placebo (comparable amount of
treatment, modes, and relationship)• treatment as usual• only measures (no treatment or wait list)
135
Why Did the Treatment Work?
Whatever parts of the primary treatment that are not in the control group must be acknowledged as the possible reasons why the treatment
improved outcomes.
136
Did the Treatment Work?
You cannot conclude that a treatment is effective unless the change in the treatment group is better than the change in a no treatment group
You cannot conclude that a treatment is effective if is only compared to another treatment
Sometimes plausible treatments interfere with natural recovery or are iatrogenic
– Ex: CISD, BPD process groups
137
Control Groups
138
Control Groups
139
Control Groups
If favored treatment showed significant
change, while control group showed non
significant change, DO NOT conclude that
treatment is superior to control!
140
RCT Analyses
Analyses must show between-group differences in change
H1: CBT will improve depression more than will TAU.
H1a: The treatment-by-time interaction effect in HLM will show that the CBT group will have larger reductions in BDI scores than will the TAU group.
141
RCT Analyses
This is WRONG
H1: CBT will have lower depression scores at post treatment than TAU.
142
These Results don’t Test H1
143
RCT Longitudinal Analyses
Repeated Measures ANOVA cannot be used with subjects with missing data (dropouts)
– missing data for dropouts can be imputed (e.g., LOCF method), but results can be misleading.
HLM is a better imputation method, although differential dropouts can still bias results.
– test dropout-by-treatment interaction effect
144
Influence of DropoutsTrue and complete data (if there were no dropouts)
145
Influence of Dropouts
146
Influence of Dropouts
147
Influence of DropoutsObserved data (missing data due to dropouts)
148
Influence of Dropouts
149
FIX THIS GRAPHObserved data (missing data due to dropouts)
150
Analyze attrition: number of inquiries, appointments, inclusion criteria met, started study, completed study
Report reasons for exclusion and dropout.Compare dropouts to completers for each
treatment groupAnalyze treatment-by-time-by-completer
three-way interaction effectPrevent study attrition despite tx drop-outs
– do Intent-To-Treat analyses
External Validity
151
Clarkin et al. 2007
IV: DBT vs. Transference-Focused PsychoTx vs. dynamic supportive tx
Sample: 30 BPD patients in each group
“no differences between groups in demographics or psychopathology”
No statistically significant outcome differences
Large difference would not be statistically significant: d=0.52, r=.25, 10% vs. 40%
ITT analyses “did not show different results”
Therapists were monitored and rated for adherence.
152
Clarkin et al. 2007
OASM pre post decrease
Irritability TFP 1.92 1.83 0.09
DBT 1.61 1.58 0.03
Anger TFP 1.74 1.56 0.18
DBT 1.52 1.43 0.09
Verbal Assault TFP 1.80 1.68 0.12
DBT 1.55 1.49 0.06
Direct Assault TFP 0.82 0.76 0.06
DBT 0.73 0.72 0.01
153
Clarkin et al. 2007
154
Clinical Significance
Does the IV really make a meaningful difference in people’s lives?
Studies need to show if findings are large or clinically meaningful (vs. small or trivial)
– functioning rather than just symptoms– statistical effect size– IV and DV scores need to indicate whether
“severe” or “good” or “large” vs. “small” (i.e., binary)
155
Clinical Significance• Correlational studies need to show:
– at least some clinically severe scores on the IV and the DV (overall, a wide range of scores is best)
– the correlation represented as percentages of severe outcomes (DV) for severe and non-severe IV groups
• Group comparison studies need to ensure that clinical groups are severe (e.g., DSM diagnosis)
• Treatment studies need to show that participants start off as severe and end in good shape
156
Clinical Significance
Most studies fail to show that findings are large or meaningful or the problems are severe.
Examples:
• mood induction manipulation check
• Rorshach correlations
• restricted emotionality (E. Rogers)
• Williams Syndrome
157
Clinical Significance
To examine the C.S. of a correlation between continuous variables make the variables binary in meaningful way.
Example: Rorshach aggression scale (RAS) and overt aggressive behavior (OAB).
never assault prior assaultHigh-RAS 10% 90%Low-RAS 90% 10%
158
Clinical Significance
IV: insult (anger) vs. neutral
DV: pre-frontal brain activity
Manipulation check:“…indicate to what extent they felt each feeling during the experiment (1 = not at all; 5 = extremely)”
“subjects in the insult cond. Reported more anger (M = 2.0) than did subjects in the no-insult condition (M = 1.4), (p < .01)”
159
Clinical SignificanceIV: Williams Syndrome vs. normal controls
DV: sociability (hyper-sociability?)Manipulation check:Results of WS subjects from the approachability test will be compared to that of the two normal control group by computing a clinical significant cutoff score for the WS group. The cutoff score will be obtained by the Jacobsen et al. formulas. Also, two independent raters, who are blind to the study objectives and identities of subjects, will code responses of the Sociability Questionnaire into four categories: shy, social (highest social and high social), and in-between (tested with Chi square). If more than 50% of WS subjects and less than 25% for the NC group are classified in the most social category, based on a previous study (Doyle, et al., 2004), then sociability of WS subjects is considered to be clinically significant for this study
160
Limits to Methods for Confounds• Randomization is sometimes artificial (external validity)
– not choosing a treatment is artificial– some I.V.s (e.g., emotions) must be simulated (artificial)
• Standardized/yoked procedures/scripts are artificial• Stratified randomization
– may not end up with even distribution among levels, which can prevent stratification from working
• Non-randomized matching is limited:– can only match on a couple variables– regression to different means – matching sometimes creates
this new confound
• Use of covariates is limited:– homogeneity of regression – the regression lines must be
parallel for different levels of the confound variable– reduces statistical power
161
Confounds vs. Moderators
• Internal validity confounds and moderators are separate issues
• A variable can be a confound and a moderator, but still are separate issues
– cannot enter as a covariate
• If there is no main effect of the IV.– there can be confounds to covary– there can be moderator effects
• crossing regression lines
162
Confounds vs. Mediators
• The label depends on your theory of the causal process
– intrinsic or extrinsic part of your I.V.?
• If there is no main effect of the IV.– there can be confounds to covary– overall mediators cannot be examined– mediators can be examined for subgroups
163
Internal vs. External Validity
Increase internal validity:• homogeneous sample• rigid interventions (e.g., session duration)
• scripted interactions (to reduce bias)• no choice of intervention (random)Increase external validity:• heterogeneous sample (few exclusions)• flexible interventions• natural interactions in natural settings• choice of intervention
164
NIMH Stage Model
1. treatment development and small pilot RCTs
2. RCTs with sufficient power showing– efficacy (high internal validity)– specificity (more than generic therapy)
3. RCTs with high external validity (may lose some internal validity)
- may be quasi-experimental studies
4. Mechanisms and mediators
165
Kazdin Stage Model
1. treatment development and small pilot RCTs
2a. RCTs with sufficient power and high internal validity and test of mediation
2b. RCT component analysis studies
3. RCTs with high external validity
166
Why EmphasizeMechanisms of Change?
• to distill interventions to their most potent components (maximally efficient treatments)
• to facilitate treatment matching (i.e., moderators that are baseline characteristics of the mediator variable)
• to facilitate implementation in normal clinical contexts (generalization) by highlighting ways that the form can be adapted while maintaining the key change processes (function).
– should not rigidly apply manuals!! (CBT vs. IPT)
167
Evidence forMechanisms of Change
• Strong association• Gradient (more change ingredient, more change occurs)
– dose-response relation
• Specificity (other plausible variables do not show mediation)
• Experiment (try to manipulate change mechanisms)
– component analysis studies
• Temporal relation (rarely established)
• Replication• Plausibility and coherence (credible change process)
168
Evidence forMechanisms of Change
Component analyses have more internal validity than mediation correlations
Examples of misleading severity confounds• In DBT study, amount of therapy in DBT
condition was not correlated with outcomes• It is possible that in CT study amount of BA
vs. CR would not be related to outcome
169
Component Analysis Studies
Also called dismantling studies
• Test if a component is necessary by comparing the full treatment to the treatment when that component is removed
• Add common factors to the reduced condition to rule them out as explanation why component is necessary
• Confirm precise mediational causal process in correlational analyses
170
Component Analysis Studies
• Mediational (shared variance) analyses such as the Sobell typically test causal processes that may account for treatment group differences
• Shared causal processes may be shown by:
– within-group mediational analyses– simple correlational analyses
171
Component Analysis Studies
172
Component Analysis Studies
173
Component Analysis Studies
Also called dismantling studies
• 2 groups needed to answer the question if one component is necessary
• 3 groups needed to answer the question if two components are necessary
• 4 groups are needed to the question if two components are necessary and how much they matter
174
HRV BiofeedbackComponent Analysis Study
Two-group RCT
1. Slow breathing + (fake feedback)2. Slow breathing + HRV visual feedback
Can test if the HRV feedback is useless
175
EMDR Component Analysis Study
Two-group RCT
• EMDR vs. EMDR - EM (11 out of 13 studies)
– exposure + eye = exposure
• PE vs. EMDR
Power and effect size?
176
First DBT Study
• Validity in the first DBT study was criticized because DBT patients received many more hours of therapy than TAU.
• Follow-up data analyses indicated that there was no correlation between number of hours and outcome
• When treatment hours were entered as covariates, DBT still had superior outcomes
• However, treatment hours could still account for why DBT subjects did better simply because the worse people got the most treatment
• Need to covary severity and hours in comparing DBT to TAU
177
DBT Replication Study
Controlled for:1. therapist expertise2. therapist allegiance to treatment provided3. clinical supervision group4. prestige of DBT5. psychotherapy6. treatment affordability and hours7. therapist gender, training/degree, and
clinical experience
178
DBT Component Analysis Study
Three-group RCT
1. DBT (individual + group skills training)2. Individual DBT + activities group3. DBT group skills training + case manag.
179
Cognitive Therapy Mediation Studies
1) CT is based on cognitive theory– thinking causes emotions and behavior
2) Change in CT is associated with cognitive change as hypothesized– concurrent change correlations
180
Component Analysis Study
Cognitive therapy for depression is comprised of cognitive restructuring (CR) and behavioral activation (BA)
• Removing CR does not reduce its effectiveness
• BA is as effective as BA+CR
181
Component Analysis Study
5 interpretations of change process
• BA works better because– thinking is irrelevant– BA is better at changing thinking– it improves environment and thinking
• CR and BA are both effective and redundant – both change thinking– both change environment (reinf + punish)
182
Component Analysis Study
If honey does not improve a sugar-sweetened desert, is honey less tasty?
Three groups are needed
• Could CR be as effective as CR+BA?
183
New Behavior Changes Cognition
“On the one hand, explanations of change processes are becoming more cognitive.
On the other hand, it is performance-based treatments that are proving most powerful in effecting psychological changes. Regardless of the method involved, the treatments implemented through actual performance achieve results consistently superior to those in which fears are eliminated to cognitive representations of threat (Bandura, 1977, p. 78)
184
Activity Scheduling in Cognitive Therapy
Pleasurable activities
• “Nothing is meaningful or worthwhile”
Mastery activities (self-efficacy)
• “I am incapable of doing anything”
Behavioral experiments
• “I am incapable of that”
• “It won’t work out”
185
New thinking prompts new behaviors that lead to more reinforcers and fewer punishers, which
changes depressive affect
186
New behaviors lead to more reinforcers and fewer punishers, which changes belief, which changes
depressive affect
187
Amount of smoking causes cancer which causes lower quality of life
H1: Amount of smoking correlated with QOL
Entire sample has cancer
H2: Exercise moderates the correlation between cancer and QOL.
Analysis of constant variables
188
Do Not Covary your Main Effect
• For Williams Syndrome children, visual-motor skills will predict daily living skills above and beyond intelligence (IQ)– hand-eye coordination is measured by the
WAIS performance tests
• Paul Paris class proposal example
189
Multiple Baseline Design
190
External Validity
is affected by:
• Non-representative samples– requirement of consent– recruiting location/procedures– narrow incentives– exclusion criteria– biased dropouts
• Non-representative procedures/context
191
Race vs. Ethnicity vs. Culture
• race = biology• ethnicity is one aspect of culture• culture is learned• race does not always correspond with
ethnicity• one’s culture is combination of one’s family
culture and mainstream culture (acculturation)
• NIMH categories: Latino is only an ethnicity
192
Internal Validity is Usually a Higher Priority than External Validity
• Ethnic minorities are usually underrepresented and researchers do not make extra effort to recruit them
• Consequence: conclusions often cannot be made about the relevance for ethnic minorities (ethnicity can be a moderator)
193
Sue (1999)
Sue argues that we should make extra effort to recruit ethnic minorities
• to increase generalizability• for social justice
However:• we cannot assume generalizability from a main
effect since ethnicity can be a moderator• it is often not feasible to have enough power to
test moderator effects in a heterogeneous sample• therefore, findings from heterogeneous samples
are often ambiguous
194
Ethical Issues
• informed consent (vs. thoughtless compliance or coercion)
– Milgram and Zimbardo studies– obtained by the therapist
• deception vs. withholding full rationale
• debriefing– mood improvement protocol– verify subjects understand and are back to
normal
• confidentiality vs. anonymity