1 research methods fall 2011. 2 science helps avoid bias biases confound our judgment overconfidence...

1

Research Methods

Fall 2011

2

Science Helps Avoid Bias

Biases confound our judgment

• Overconfidence

• Confirmation bias

• Self-fulfilling prophecies

• Belief perseverance

• Illusory correlations

• Availability heuristic

• Conjunction fallacy

• Seeing patterns when there are none

3

1. Our clinical judgment is biased– biases in perception and interpretation are

pervasive (e.g., blinding, grading, replication effect sizes lower)

– when figuring out causes of problems– when figuring out what is effective for individuals– change occurs for many reasons so difficult to

detect specific causes with the naked eye– hard to see subtle, delayed, or slow change

resulting from specific causes– “common sense” is often wrong– harmful treatments can seem helpful

Is a Research Class Necessary?

4

Early in therapy, my BPD client got worse.• Caused by therapy?• What could be other causes?

After 8 months, her behavior improved.• …soon after her parents threatened to stop

paying for therapy• What could be other causes?

BPD Case Study

5

2. Lists of ESTs is overly simplistic– research does tell us what is effective NOW

for my particular client in this particular setting– largely ignores moderators– minimally addresses principles of change

(mediators)– too many treatments on the list

Is a Research Class Necessary?

6

Numerous therapists practice unvalidated and sometimes discredited methods

• Astrology, Tarot cards, palm reading• Homeopathic remedies• Primal Scream Therapy• Sensory deprivation therapy• Rebirthing Therapy• Thought Field Therapy• Facilitated Communication (autism)• EMDR (SDPA article)

Pseudo-Science is Easy to Believe

7

Science provides a systematic and (relatively) reliable approach to figuring out causes of important problems and change:

Why Do I Care about Research?

8

Guidelines for physical health:• Exercise, omega-3, anti-oxidants, vitamin D3• Red wine and alcohol healthy• Flossing could make you live longer• Breast cancer - soy and estrogen replacement• Amalgam fillings• Low calorie diets• Cholesterol

Would you get surgery from a doctor whose practice wasn’t based on reliable evidence?

Why Do I Care about Research?

9

1. Dissertation (quicker, easier)

2. Research is fun!3. Better understand disorders (causes)4. Prediction of clinical outcomes5. Improve effectiveness with clients

– evidence-based practice– what needs to change (causes)– how to change it (causes)

6. Publish (to get internships & postdocs)

Why a Research Class?

10

1. Cannot please everyone!1. Testing is a drag!

2. Try to balance pace

2. Some students feel overwhelmed whereas others have said they have learned nothing new

1. need solid grasp of stats

3. Labeling/terminology is important

This Class is Demanding!

11

Plaque on teeth correlated with plaque in arteries, therefore flossing could make you live longer (on TV news show)

how to choose a topic - bring in an article

the ideal experiment - a time machine

RG mom said she was been worse since start of therapy

RG better because of coercion

"low calorie" foods can lead to weight gain (like a placebo effect)

Garlic causes insomnia

17

It is problematic to ignore research

Research findings can easily bemisinterpreted or misused

(be careful relying on expertsor over-valuing statistical significance)

Not all research is created equal

18

1. Many sources of bias (many subtle)

2. Confounds make results ambiguous

3. Results do not generalize

Research Findings Can Easily Be Misinterpreted

19

Ways to Determine What Works

• Clinical observation and intuition• Treatment research can reduce bias and

ambiguity

20

Three basic designs

• Observational/correlational studies

• Non-randomized manipulations

• Randomized experiments

Correlational studies never randomize!!

Not all Research Designs are Equally Persuasive

21

Too many causes to untangle

Hard to isolate a specific cause

(poor internal validity)

Non-randomized Studies Often Yield Ambiguous Answers

Regarding Cause and Effect

22

1. Have a clear causal theory (if possible)

2. Causes must precede effects

“cause” = independent variable (IV)

“effect” = dependent variable (DV)

OR

the IV precedes (and predicts) the DV

Studying Cause and Effect

23

Internal validity improves when you rule out confounds; for example, you improve internal validity when:

– you include gender as covariate– you exclude men– you match non-randomized groups on gender– the supposed cause precedes and correlates

with the effect

Internal Validity ofNon-Randomized Studies

24

1. Cross-sectional (one time data collection)– correlations among current events,

experiences, behaviors, and constructs– retrospective: some measures rely on memory

for prior events, behaviors, etc.

2. Prospective (longitudinal)

Correlational Studies

25

PUT IN SLIDE ON PROSPECTIVE STUDY OF SHAME PREDICTING SUBSEQUENT SELF-INJURY, WHILE COVARYING BASELINE SELF-INJURY

Prospective Correlational Studies

26

Percent Eventual Suicide of Persons at High Risk for Suicide Who Obtain Treatment vs. Refuse Treatment

Motto (1976)

27

Percent Suicide for Contacted vs. Non-Contacted High Suicide Risk Persons

Who Refuse Further Treatment

Motto (1976)*=p<.05*=p<.05

28

If you believe that non-randomized studies are sufficient to evaluate

treatment efficacy, then you have to admit that treatment-as-usual increases suicide among high risk individuals. If

you don’t want to make that conclusion, then you need experimental research.

29

• suicide treatment study

• estrogen therapy study

• critical incident stress debriefing

• cholesterol studies yields consistent findings

Non-randomized Studies Often Yield Different Findings than RCTs

30

What conclusions can be made with what degree of confidence?

1. Is the IV really the IV? DV?2. Is I.V. really a true cause of D.V.? (internal validity)

– Alternative interpretations of findings?– Does the intervention work?

3. Why did the “cause” lead to the effect?– How does the I.V. cause the D.V.?– Why does intervention work (mediators)

4. For whom are the causes truly causes (what populations and settings)?

Research Validity

31

Are the results “confused”?

Is the I.V. confounded with another variable, and could this third variable be the main cause of the I.V. and the D.V.?

Is the IV-DV relationship spurious?

Does the I.V. cause the D.V. for the specific hypothesized reasons or are supposedly non-essential parts the primary causes?

Confound = Confuse

32

Tell your Grandma!

1. State the IV-DV relationship and why you think the relationship exists

2. Identify a confound• However, the IV-DV relationship “could be due to ___”

2a. State the IV-confound relationship

2b. State the confound-DV relationship

2c. Therefore, the IV-DV relationship may simply be due to the confound.

What are Confounds?

33

Example: Why is hot weather associated with ice cream sales? Causal link or spurious correlation?

Example: If darker-skinned people commit more crimes, what could be the reason? Causal link or spurious correlation?

Example: Why did CBT group end up with less depression than group who got supportive counseling? Causal link? Or is outcome difference due to some other difference between the groups?

Example: Why do hairier players score more goals? Causal link or spurious correlation?

Confounds

34

Mediation

7. Treatment leads to changes in outcome (direct effect)

8. The direct effect for treatment diminishes (in magnitude and significance) when the mediator is entered in the analysis. Treatment is effective because it changes ME.

ME = stressful events or negative cognitions

35

Time confounds

Maturation confound

Cognitive training to young people

Cognitive training to old people– natural cognitive decline could mask benefit

Internal Validity

36

Hairiness is associated with scoring more goals on the mixed-gender soccer team because having more hairs keeps muscles warmer.However, that may be due gender, strength, and speed.Men are more often stronger and faster players.Stronger and faster players can score more.Therefore, hairiness per se may not cause the more effective performance.

psych medications is likely to be an important internal validity confound in a study of bipolar disorder. If the control group does not have equal amounts and types of meds then difference between bipolar participants and participants in the control group could simply be caused by the difference in medications. Bipolar participants may have worse memory simply because they have more toxic medications in their body that cause memory impairment because they cause brain atrophy or heavy sedation. We have to acknowledge this possibility and hopefully we can rule it out by controlling for these variables.

Homework #1

37

TWO MOODLE SUBMISSIONS ARE REQUIRED for this assignment.http://www.dbtsandiego.com/form11

1) Identify one specific plausible mediator. Describe the mediator clearly and completely.2) Identify one specific plausible threat to internal validity? Describe the confound clearly and completely, including the direction of the effect.3) Identify one specific plausible moderator of the relationship between the independent and dependent variables, and explain the moderation effect clearly and completely, including the direction of the associations.

Study 1: Consider a single-group correlational study to assess the effects of level of childhood sexual abuse on adult interpersonal violent behavior, both measured as continuous variables. All participants had at least some history of abuse during childhood, and level of abuse was a combination of frequency and severity of prior abuse episodes, ranging from a single instance of an older child touching the participant's genitals over clothing, up to multiple rapes involving intercourse with an adult stranger and threats of violence. Level of violence was a combination of frequency and severity, ranging from a single instances of verbal cruelty or destroying the property of others, up to multiple physical assaults or murder.

Study 2: Consider a two-group study designed to test the hypothesis that having a history of childhood sexual abuse (binary independent variable: none versus any) increases risk of physically assaulting another person (binary dependent variable: never versus at least one). The percentage of adult participants who have physically assaulted another person at least once will be compared between the abused and non-abused groups.

Homework #1

http://www.dbtsandiego.com/form11

38

Confounds vs. Mediators

• The label depends on your theory of the causal process

– intrinsic or extrinsic part of your I.V.?

• If there is no main effect of the IV.– mediators cannot be examined (overall)– there can be moderator effects

• crossing regression lines• mediators can be examined for subgroups

39

Internal Validity:• Is the I.V. really a true cause of the D.V.?

Construct Validity:• Why did the “cause” lead to the effect?

Depends on your theory• Is explanatory variable an intrinsic part of I.V.?

Internal vs. Construct Validity

40

Construct Validity

Finding out:

• Why does a I.V. lead to a D.V.?

• Why does a manipulation cause a change?

• Why is an intervention effective?

41

Two Types of Construct Validity

• Construct validity of IV– some intrinsic part of the IV accounts for the

IV-DV correlation or group difference

• Causal sequence (mediation)– the IV => mediator => DV– the mediator occurs after the IV– the mediator occurs before the DV

42

Construct Validity

Examples:

• Ice Cream

43

Mediation

7. Treatment leads to changes in outcome (direct effect)

8. The direct effect for treatment diminishes (in magnitude and significance) when the mediator is entered in the analysis. Treatment is effective because it changes ME.


44

First state your specific theory of why the IV is related to the DV (intermediate cause)

IV => M => DV

Mediation requires:1. IV is correlated with mediator

2. mediator is correlated with DV

3. The IV-DV correlation is reduced when the mediator is added into the prediction equation

4. A mediation test shows the IV-DV correlation reduction is statistically significant

Mediation

45

Correlational mediation tests do not prove causal direction

All pathways could be true:

IV => M => DV

M => IV => DV

DV => M => IV

DV => IV => M

Mediation

46

Construct Validity Confounds

1. State your specific theory of why the IV is related to the DV (mediator)

2. Identify a competing theory of the IV-DV relationship (C.V. confound), e.g., how the relationship could be due to other intrinsic parts of the IV or DV (which may be more generic or broader characteristics of the IV or the DV.

3. Identify variables that share the confounds– include the variable(s) as covariate(s)– have those characteristics in control group

47

Validity vs. Confounds

48

Confound Examples

• Do BPD patients have more shame than normal controls?

• Does shame lead to self-injury?• Does CBT reduce depression (compared to

wait list control condition)?

• Does HRV biofeedback plus exposure lead to more reduction in fear than exposure alone?

What are the IV and DV construct validity confounds?

49

Confound Examples

Independent variables:

• Chinese (human, race, culture, Asian)• shame (emotion, depression, anxiety, guilt)

• BPD (disorder, personality disorder, history of abuse, suicidality)

• CBT (support, hope, commitment, payment, problem-solving, new thinking, beh. activation)

50

Designing a Control Group

1. Figure out all the variables (characteristics or experiences) that are likely to influence your dependent variable other than your independent variable (confounds)

2. Include most important confounds in the control group

3. Figure out all the components of your independent variable (population, disorder, manipulation, treatment, etc.)

4. Decide what part of the independent variable you want to evaluate (broad or specific?) and include all other part(s) of the I.V. in the control group

51

Example: If people who get CBT truly get improved outcomes and placebo effects fully explain why, is that evidence that “CBT” works?

Example: If people who get CBT truly get improved outcomes and number of stressful events during therapy fully explains why, is that evidence that “CBT” works?


52

Example: If people with BPD have worse suicidal behavior than normals and poor coping fully explains why, is that evidence that “BPD” per se is the cause?

Example: If people with BPD have worse suicidal behavior than normals and number of stressful events during therapy fully explains why, is that evidence that “BPD” per se is the cause?

Example: If people with BPD have worse suicidal behavior than normals and psych meds fully explains why, is that evidence that “BPD” per se is the cause?


53

Merge this with other slides

• Use a control group that has the characteristics that you want to rule out

• Adjust for independent variable confounds with covariates

– Ex: Tangney "shame-free guilt" and "guilt-free shame"

• Include a DV that measures some non-essential aspect of the primary DV to show there is a stronger IV-DV association for the primary DV

54

Moderators

Moderators are third variables that moderate or change the magnitude

or direction of the relationship between two other variables

55

Moderation

5. Treatment is less effective for [men].

6. Coping skills reduce the impact of stressors

6. Mindfulness skills reduce the impact of negative thoughts


56

Confounds vs. Moderators

• Internal validity confounds and moderators are separate issues

• A variable can be a confound and a moderator, but still are separate issues

– cannot enter as a covariate

• If there is no main effect of the IV.– mediators cannot be examined (overall)– there can be moderator effects

• crossing regression lines• mediators can be examined for subgroups

57

Moderator Examples• Resiliency (reduce causality)

– Does abuse cause violence?

• Risk factors (increase causality)– Does smoking cause cancer?

• Do opiates cause euphoria? (Naltrexone)• Does alcohol cause euphoria? Antabuse• MBCT to prevent depression relapse• BA best for severe depression• Psychotropic meds depend on race (Kazdin)

• Acculturation

Mindfulness-Based CT

Study 1

# epis. MBCT TAU

1-2 54% 31%

>2 37% 66%

Study 2

MBCT TAU

1-2 50% 20%

>2 36% 78%

59

Moderator Example

60

Depends on your question:

• Why? – construct validity

• Generalize? – external validity

Kazdin, pg 63

External vs. Construct Validity

61

Passage of time

Testing – desensitizing to shame questions

Testing – diary cards

Instrumentation

– July 4th arrests, suicides

– what is “good” or “bad” to raters is relative to what has already been seen

Confound Examples

62

Sources of Bias

• Hawthorne effects

• Demand effects

• Placebo effects– active placebo– placebo surgery

• Interpersonal expectancy effects (Rosenthal)

63

Participant Reactivity (Bias)

A = Hawthorne; B = Demand effects

C = Expectancy/Placebo; D = Rosenthal

64

Limiting Bias

• Forms of bias– Hawthorne effects– expectancy effects (e.g., placebo)– demand effects

• Bias protection– naive experimenters (therapists vs. assessors)

• no knowledge of experimental condition• no knowledge of hypotheses

– automated and standardized procedures

65

Reduce Bias• Rosenthal Effect

– naïve experimenters and interviewers• check knowledge of subjects and hypotheses

– scripted procedures (standardized)• written or recorded instructions

– balance bias: therapist allegiance in RCT– balance bias: plausible rationale for control group– measure expectancies

• Demand effects– withhold hypotheses

• check subjects’ beliefs about purpose of study (debriefing)

– have indirect measures (not self-report)

66

Example: Mediators and Moderators

Rosenthal Effect among Teachers

Mediator:• students persist more• teachers provide more help (persist)• teachers more effectively reinforce students• teachers’ biased evaluations of students

Moderator:• student-teacher similarity• student attractiveness and kindness• teacher burnout

67

Example: Mediators and Moderators

Rosenthal Effect among Researchers

Mediator:• subjects/patients persist more• experimenter/therapist provides more prompting• experimenter/therapist reinforce target behavior• experimenter biased evaluations of subjects

Moderator:• subject-experimenter similarity (patient-therapist)• patients’ fear of failure/success

68

1. ____ is a serious problem

2. Currently we do not adequately understand the problem (insufficient data)

3. Current treatments are insufficient because:

4. It is plausible that a missing piece is…

5. No study has yet tested…

6. This study is needed to address…

Study Justification

69

1. Do not state null hypotheses. Instead:– show differences in correlations– test moderator effects– specify effect size and/or confidence intervals

2. State basic (non-jargon) idea/theory

3. Operationally define how test theory (measures)

4. State direction of effects clearly

5. Not too many, not too few

Research Hypotheses

70

Dealing with I.V. Confounds• Measure all plausible internal validity confounds

and evaluate their role in the results• Prevent confounds if possible

– make subjects homogeneous on confound variable• narrow inclusion criteria

– use a control group that is equivalent on participant baseline characteristics

• randomize (stratified) to groups• match subjects (case-control, quasi-experimental study)• within-subjects design

– standardize/yoke procedures/scripts– naive experimenters/interviewers to reduce bias

• check the blind

• Adjust for confounds with covariates

71

Within-Subjects Designs

Within the same person:• comparing multiple things• comparing the effect of two manipulations

Advantages:• virtually eliminates person variable confounds• increases statistical power• can yield truer correlations

Disadvantages• time confounds• order and carry-over effects (manipulations)

72

Within-Subjects Designs

Multiple-treatment designs

• multiple manipulations

Single-subject designs

• ABABAB

• multiple-baseline

Repeated measures designs

• change over time

• covariation over timeSmall sample diary studies (N < 30) (Caspi (1987)

73

Reasons for Parasuicide

Method of Analysis

Between-Ss Within-Ss

Reason NS SA NS SA

Feeling Generation 54 21** 5915***

Self-punishment 63 38* 59 51

Anger expression 63 24*** 54 28**

74

Correlational Methods

Research Question: What is the association between shame and suicide ideation?

1. Between-subjects: each subject has one shame score and one ideation score

2. Within-subjects: each subject has multiple pairs of shame and ideation scores

– correlations per person (HLM)– small sample diary study (Caspi method)

75

Correlation: shame is correlated with SI• BS: people with higher shame have more SI• WS: when the shame (of individuals) increases

their SI also increases

Experiment: increasing shame increases SI• BS : people who get a shame induction

increase their SI more than people who do not• WS: SI increases more when people get a

shame induction than when they (same individuals) do something else

Shame and Suicide Ideation

76

Within-Subjects Correlations

77

Within-Subjects Correlations: HLM

1. Regression lines for each subject

2. Compute the average regression line

78

Within-Subjects Correlations: HLM

1. One regression line for all subjects

2. Many IV-DV pairs per subject

79

Caspi (1987) Correlation Method

Stress ratings perfectly predict concurrent painr = 1.0

80


Which correlation is larger?

81


1. Collect many frequent measures of IVs and DVs– e.g., daily scores for at least several weeks

2. Data for all subjects in one regression equation

3. Remove between-subjects effect:– dummy code each participant and enter all dummy

coded variables into regression

4. Test if today’s IV score predicts tomorrow’s DV score better than it predicts yesterday’s DV

5. Test if today’s IV score predicts tomorrow’s DV score when covarying today’s DV score

82

Randomized Experiments

Step 1: Select an intervention or manipulation that simulates a cause in the natural world (independent variable)

Step 2: Select a randomization method and verify that groups are comparable on confounds

Step 3: Verify that the intended independent variable actually occurred sufficiently (manipulation check or treatment adherence)

83

Randomized ExperimentsAnalog studies = simulations

Independent VariablesEmotions mood inductionSocial exclusion computer simulationAttribution bias ambiguous aggression scenariosJury decisionsvignettes of criminal trialsMalingering instructions to fake malingeringSuppression suppression instructionsWorry worry instructionsSelf-Injury enduring cold-pressor pain

84

Randomized ExperimentsAnalog studies = simulations

Dependent Variables (samples of behavior)Aggression electric shock (self-harm studies too)

Aggression point subtraction penaltiesSelf-harm cold pressor task, electric shockStigma attitude electric shock (to patients)

Persistence time with unsolvable anagramsImpulsivity gambling gamesBinge eating snack food left in room

85

Dealing with I.V. Confounds• Measure all plausible internal validity confounds

and evaluate their role in the results• Prevent confounds if possible

– make subjects homogeneous on confound variable• narrow inclusion criteria

– use a control group that is equivalent on participant baseline characteristics

• randomize (stratified) to groups• match subjects (case-control, quasi-experimental study)• within-subjects design

– standardize/yoke procedures/scripts– naive experimenters/interviewers to reduce bias

• check the blind

• Adjust for confounds with covariates

86

1. The subjects differ (selection bias) at different levels of the I.V.

– baseline levels of the D.V. (severity)– demographics (gender, age, ethnicity, SES)– differential drop outs

2. Subjects’ experiences in study differ– demand/expectancy effects

• experimental group more hopeful• control group demoralized or competitive

– amount of treatment received (or practice)

Internal Validity Confounds

87

Randomization Failure

Probability that at least one confound will occur due to chance:

• 22.6% if 5 confounds are tested



(assumes p<.05 per confound and that all confounds are independent of each other)

88

Must Check ifRandomization Worked!

• Not rare that baseline differences on at least one variable emerge due to chance!

• A stratum (level) may be too big

• Subjects do not fill all strata (levels)– In a DBT study, stratified randomization failed

because only “medium” severity subjects entered study. More severe “medium” severity subjects were in control group.

89

Effect Sizes

Pearson r indicates the magnitude of association between two continuous variables

Cohen’s d indicates the magnitude of association between a binary variable and a continuous variable

90

Effect Sizes

correlation t-testPearson r (r2) Cohen’s d

Small .10 (.01) .20

Medium .25 (.08) .50

Large >.38 >.80

91

Effect Sizes

92

Problems of Multiple Tests: Inflated Type I error rate

If alpha level is set at .05, the chance of finding at least one Type I error is:

• 22.6% if five statistical tests are done

• 40.1% if ten statistical tests are done

• 64.1% if 20 statistical tests are done

93

Test which cars can get you from San Diego to Los Angeles the fasted—red or blue?

Why (mediator)?• (knowledge of) faster route• push gas harder (no fear of police or crash)• car accelerates faster• car has more horse power

When (moderator)?• number of stops (because of acceleration)• number of hills (because of horse power)

Example

94

Nonspecific Predictors of Outcome

95

Passive recruiting (e.g., posting flyers) will result in a very biased sample. People who respond to ads are different than the general public.

Instead go to places in person and approach people.

Go to a mall and give people 5 dollars in advance.

External Validity

96

Problems with Self-Report

measure inequivalence contaminates static group comparison studies

• questions mean different things to different people (concept inequivalence)

– intimacy example NEED BETTER EXAMPLE

• people use number scales differently (metric inequivalence)

– Italians vs. Irish pain ratings– BPD more emotional than APD?

Solutions:• randomization balances out the differences• within-group correlations

97

Best Self-Report Methods

Current observable states– to avoid memory bias– to avoid unnecessary inferences

…measured in relevant contexts to ensure activation of schema

– mood induction– priming procedures– experience sampling in natural contexts

Interviews

98

Are Women More Emotional?

YES, when comparing global retrospective self-reports of women vs. men

NO, when comparing average emotional states measured by experience sampling

Barret et al. (1998). Are women the "more emotional" sex?

99

Number of days to first parasuicide

0 100 200 300 400

0.00

0.25

0.50

0.75

1.00

Low shame

High shame

Survival Plot for Shame

100

Why does trait shame not predict self-injury?

Shame Variability

101

Client #3

0

5

10

15

20

25

Week

SSGS

Sha

me

Baseline

Client #5

0

5

10

15

20

25

Week

SSGS

Sha

me

Baseline

Shame Variability

102

Avoid the Problems ofShared Method Variance

and Socially Desirable Responding

103

Alternatives to Self-ReportYour research proposal should include at least one:• informants (e.g., spouse or significant others)• behavioral samples

– Behavioral Approach/Avoidance Test– observational coding (e.g., FACS)– unobtrusive behaviors (e.g., Bargh studies)– Davison ATSS

• psychophysiology• performance tests (reaction time measures)

– semantic priming– stroop– IAT

104

Alternatives to Self-ReportPASATExclusion computer program

PSAPElectric shockconcentration

105

Implicit Association Test

106


107


108


109


110

Semantic Priming

The driver stepped on the… GAS

The driver stepped on the… BRIDGE

They said it was the… BRIDGE

111

Semantic Priming

He was less stressed when he had the…

BEER

They said it was the… BEER

112

Semantic Priming

1. I deserve…PUNISHMENT

2. I deserve…WATER

3. I deserve…PRAISE

4. A criminal deserves…PUNISHMENT

5. I injure myself for…PUNISHMENT

6. They said it was for…PUNISHMENT

RT: 1<2<3, 1=4<6, 4=5<6

113

Semantic Priming

I injure myself for…RELIEF

I injure myself for…EXPERIENCE

Aspirin can provide…RELIEF

114

Schema Activation

Schema: “Black people are dangerous”

Situation: police officer sees person standing up from behind an object in an alley

Motor response: ??

115

Schema Activation

Schema: “Old people are slow and sickly”

Priming: see “old” words

Motor response: slower walking down hall

Schema: “Interrupting is rude, helping is nice”

Situation: describing a nice friend

Motor response: offer help to someone else

116

Valid Coding/InterviewingTraining• select extra material not to be analyzed• talk through examples• code separately and confirm inter-rater reliability

with Kappa, ICC, or PearsonHave primary rater be naïve to hypotheses/subjectsVerify reliability of coded analysis variables• inter-rater reliability (>20% of data)• intra-rater reliabilityRe-train if necessary• use material that will not be analyzed• do not reveal discrepancies for real data that

must be re-coded

117

Manipulation Checks

Verify that the experimental manipulation worked (as intended) to know mediation

• subjects paid attention and retained important information

• subjects actually had the targeted emotional experience

• subjects complied with instructions

• the intervention was delivered correctly (integrity / adherence / fidelity)

118

Statistical Issues

• Missing data

• Type I versus Type II errors– data snooping (fishing)

• Power analyses

• Maximizing power

119

Type II Errors are Ubiquitous

• Most studies are underpowered to detect anything but large effect sizes

• A statistically non-significant result does not mean no correlation or no difference

– medium-sized effects are often not statistically significant

– most studies cannot detect small effects

120

Ways to Increase Power• increase sample size• increase group differences (effect size)• within-subjects (use both pre- and post-tests, control stimuli)

• increase alpha (e.g., .10% Type I error rate)• one-tailed tests (for a priori hypotheses)• be parsimonious

– in primary hypotheses and analyses– only have two groups

• decrease variability– standardize procedures and use scripts– do not counterbalance unless necessary– homogeneous sample (narrow inclusion)– use reliable measures– clean your data

121

ControversyPosition 1: Dodo Bird (e.g., Wampold)

• all therapy benefit due to common factors

• specific techniques make no difference

Position 2: RCTs are irrelevant to real world• Poor external validity (too many exclusions)

Position 3: RCT methods do not identify best treatments (because they compare to TAU or waitlist). Other research strategies better for figuring out active treatment ingredients (Sprenkle, Davis, & Lebow)

122

Controversy: The Dodo Bird

In a diverse literature it is vital to consider these specific issues for specific studies

1. Credibility of the results (or opinion)

2. What specific problem?

3. What specific treatments?

123

Controversy: The Dodo Bird

All therapy benefit due to common factors and specific techniques make no difference.

• Based on metaanalyses that inappropriately average across studies with diverse disorders, treatment comparisons, and methodologies.

• Analogy: Ask all San Diegans “Do pills effectively treat sore throat, cough, & nasal congestion?"

124

Goals of Treatment Research

To find out most effective treatments:

• The treatment DID cause change (efficacy)

• The treatment causes change (specificity)

• The treatment causes meaningful and lasting change

• The treatment works in the real world– for whom?

• The reasons why treatment works

125

Treatment Research

efficacious – when a treatment is better than no treatment or is comparable to another treatment with established efficacy

specific – when a treatment is better than a placebo control condition or another credible treatment

effective – when a treatment is shown improve outcomes in real-world settings

126

NIMH Stage Model1. treatment development, single-subject and

single-group designs, predictors of treatment success, small pilot RCTs

2. RCTs with sufficient power showing– 2a) efficacy (high internal validity)– 2b) specificity (more than generic therapy)

3. RCTs with high external validity (may lose some internal validity)

- may be quasi-experimental studies

4. Mechanisms and mediators

127

Kazdin Stage Model

1. treatment development and small pilot RCTs

2. RCTs with sufficient power and mediation correlational analyses

– 2a) efficacy (high internal validity)– 2b) specificity (more than generic therapy)– 2c) component analysis studies

3. RCTs with high external validity

128

Levels of Empirical Support

Level 5: Efficacious and Specific

Level 4: Efficacious

Level 3: Probably Efficacious

Level 2: Possibly Efficacious

Level 1: Not empirically supported

129

Effectiveness

When treatments are shown to be:

1. effective for common patient populations– high compliance

2. effective when delivered by common therapists in common settings

– high acceptability and compliance– easy to disseminate and train

3. cost-effective

130

What Does Work and Why?External Validity

EXTERNAL VALIDITY• True experiments with high variability and

common people (few exclusions)• True experiments in common settings with

common therapists• True experiments with various subgroups

of patients• Quasi-experimental designs

131

What Does Work and Why?Construct Validity

CONSTRUCT VALIDITY• Rigorous control groups

– rule out that therapy was received– rule out amount of therapy received– rule out placebo/expectancy/demand effects– rule out therapist characteristics/differences– do manipulation checks

• Dismantling studies– identify active treatment components– (experimental) psychopathology research

132

Treatment Research:Gold Standard

Randomized Controlled Trial

133

Two Uses of Control Groups

• Internal validity– have equivalent groups treated equally

EXCEPT the intervention

• Construct validity– have equivalent groups treated equally

EXCEPT the most important active ingredients of an intervention

134

Control Group Hierarchy

Control group is/includes:• components of active treatment (e.g.,

behavioral activation but not cognitive restructuring)

• comparable morale/confidence/allegiance• comparable "quality" of treatment (e.g.,

experience/expertise)• attention placebo (comparable amount of

treatment, modes, and relationship)• treatment as usual• only measures (no treatment or wait list)

135

Why Did the Treatment Work?

Whatever parts of the primary treatment that are not in the control group must be acknowledged as the possible reasons why the treatment

improved outcomes.

136

Did the Treatment Work?

You cannot conclude that a treatment is effective unless the change in the treatment group is better than the change in a no treatment group

You cannot conclude that a treatment is effective if is only compared to another treatment

Sometimes plausible treatments interfere with natural recovery or are iatrogenic

– Ex: CISD, BPD process groups

137

Control Groups

138

Control Groups

139

Control Groups

If favored treatment showed significant

change, while control group showed non

significant change, DO NOT conclude that

treatment is superior to control!

140

RCT Analyses

Analyses must show between-group differences in change

H1: CBT will improve depression more than will TAU.

H1a: The treatment-by-time interaction effect in HLM will show that the CBT group will have larger reductions in BDI scores than will the TAU group.

141

RCT Analyses

This is WRONG

H1: CBT will have lower depression scores at post treatment than TAU.

142

These Results don’t Test H1

143

RCT Longitudinal Analyses

Repeated Measures ANOVA cannot be used with subjects with missing data (dropouts)

– missing data for dropouts can be imputed (e.g., LOCF method), but results can be misleading.

HLM is a better imputation method, although differential dropouts can still bias results.

– test dropout-by-treatment interaction effect

144

Influence of DropoutsTrue and complete data (if there were no dropouts)

145

Influence of Dropouts

146


147

Influence of DropoutsObserved data (missing data due to dropouts)

148


149

FIX THIS GRAPHObserved data (missing data due to dropouts)

150

Analyze attrition: number of inquiries, appointments, inclusion criteria met, started study, completed study

Report reasons for exclusion and dropout.Compare dropouts to completers for each

treatment groupAnalyze treatment-by-time-by-completer

three-way interaction effectPrevent study attrition despite tx drop-outs

– do Intent-To-Treat analyses

External Validity

151

Clarkin et al. 2007

IV: DBT vs. Transference-Focused PsychoTx vs. dynamic supportive tx

Sample: 30 BPD patients in each group

“no differences between groups in demographics or psychopathology”

No statistically significant outcome differences

Large difference would not be statistically significant: d=0.52, r=.25, 10% vs. 40%

ITT analyses “did not show different results”

Therapists were monitored and rated for adherence.

152

Clarkin et al. 2007

OASM pre post decrease

Irritability TFP 1.92 1.83 0.09

DBT 1.61 1.58 0.03

Anger TFP 1.74 1.56 0.18

DBT 1.52 1.43 0.09

Verbal Assault TFP 1.80 1.68 0.12

DBT 1.55 1.49 0.06

Direct Assault TFP 0.82 0.76 0.06

DBT 0.73 0.72 0.01

153

Clarkin et al. 2007

154

Clinical Significance

Does the IV really make a meaningful difference in people’s lives?

Studies need to show if findings are large or clinically meaningful (vs. small or trivial)

– functioning rather than just symptoms– statistical effect size– IV and DV scores need to indicate whether

“severe” or “good” or “large” vs. “small” (i.e., binary)

155

Clinical Significance• Correlational studies need to show:

– at least some clinically severe scores on the IV and the DV (overall, a wide range of scores is best)

– the correlation represented as percentages of severe outcomes (DV) for severe and non-severe IV groups

• Group comparison studies need to ensure that clinical groups are severe (e.g., DSM diagnosis)

• Treatment studies need to show that participants start off as severe and end in good shape

156


Most studies fail to show that findings are large or meaningful or the problems are severe.

Examples:

• mood induction manipulation check

• Rorshach correlations

• restricted emotionality (E. Rogers)

• Williams Syndrome

157


To examine the C.S. of a correlation between continuous variables make the variables binary in meaningful way.

Example: Rorshach aggression scale (RAS) and overt aggressive behavior (OAB).

never assault prior assaultHigh-RAS 10% 90%Low-RAS 90% 10%

158


IV: insult (anger) vs. neutral

DV: pre-frontal brain activity

Manipulation check:“…indicate to what extent they felt each feeling during the experiment (1 = not at all; 5 = extremely)”

“subjects in the insult cond. Reported more anger (M = 2.0) than did subjects in the no-insult condition (M = 1.4), (p < .01)”

159

Clinical SignificanceIV: Williams Syndrome vs. normal controls

DV: sociability (hyper-sociability?)Manipulation check:Results of WS subjects from the approachability test will be compared to that of the two normal control group by computing a clinical significant cutoff score for the WS group. The cutoff score will be obtained by the Jacobsen et al. formulas. Also, two independent raters, who are blind to the study objectives and identities of subjects, will code responses of the Sociability Questionnaire into four categories: shy, social (highest social and high social), and in-between (tested with Chi square). If more than 50% of WS subjects and less than 25% for the NC group are classified in the most social category, based on a previous study (Doyle, et al., 2004), then sociability of WS subjects is considered to be clinically significant for this study

160

Limits to Methods for Confounds• Randomization is sometimes artificial (external validity)

– not choosing a treatment is artificial– some I.V.s (e.g., emotions) must be simulated (artificial)

• Standardized/yoked procedures/scripts are artificial• Stratified randomization

– may not end up with even distribution among levels, which can prevent stratification from working

• Non-randomized matching is limited:– can only match on a couple variables– regression to different means – matching sometimes creates

this new confound

• Use of covariates is limited:– homogeneity of regression – the regression lines must be

parallel for different levels of the confound variable– reduces statistical power

161

Confounds vs. Moderators

• Internal validity confounds and moderators are separate issues

• A variable can be a confound and a moderator, but still are separate issues

– cannot enter as a covariate

• If there is no main effect of the IV.– there can be confounds to covary– there can be moderator effects

• crossing regression lines

162

Confounds vs. Mediators

• The label depends on your theory of the causal process

– intrinsic or extrinsic part of your I.V.?

• If there is no main effect of the IV.– there can be confounds to covary– overall mediators cannot be examined– mediators can be examined for subgroups

163

Internal vs. External Validity

Increase internal validity:• homogeneous sample• rigid interventions (e.g., session duration)

• scripted interactions (to reduce bias)• no choice of intervention (random)Increase external validity:• heterogeneous sample (few exclusions)• flexible interventions• natural interactions in natural settings• choice of intervention

164

NIMH Stage Model


2. RCTs with sufficient power showing– efficacy (high internal validity)– specificity (more than generic therapy)

3. RCTs with high external validity (may lose some internal validity)

- may be quasi-experimental studies

4. Mechanisms and mediators

165

Kazdin Stage Model


2a. RCTs with sufficient power and high internal validity and test of mediation

2b. RCT component analysis studies

3. RCTs with high external validity

166

Why EmphasizeMechanisms of Change?

• to distill interventions to their most potent components (maximally efficient treatments)

• to facilitate treatment matching (i.e., moderators that are baseline characteristics of the mediator variable)

• to facilitate implementation in normal clinical contexts (generalization) by highlighting ways that the form can be adapted while maintaining the key change processes (function).

– should not rigidly apply manuals!! (CBT vs. IPT)

167

Evidence forMechanisms of Change

• Strong association• Gradient (more change ingredient, more change occurs)

– dose-response relation

• Specificity (other plausible variables do not show mediation)

• Experiment (try to manipulate change mechanisms)

– component analysis studies

• Temporal relation (rarely established)

• Replication• Plausibility and coherence (credible change process)

168

Evidence forMechanisms of Change

Component analyses have more internal validity than mediation correlations

Examples of misleading severity confounds• In DBT study, amount of therapy in DBT

condition was not correlated with outcomes• It is possible that in CT study amount of BA

vs. CR would not be related to outcome

169

Component Analysis Studies

Also called dismantling studies

• Test if a component is necessary by comparing the full treatment to the treatment when that component is removed

• Add common factors to the reduced condition to rule them out as explanation why component is necessary

• Confirm precise mediational causal process in correlational analyses

170


• Mediational (shared variance) analyses such as the Sobell typically test causal processes that may account for treatment group differences

• Shared causal processes may be shown by:

– within-group mediational analyses– simple correlational analyses

171


172


173


Also called dismantling studies

• 2 groups needed to answer the question if one component is necessary

• 3 groups needed to answer the question if two components are necessary

• 4 groups are needed to the question if two components are necessary and how much they matter

174

HRV BiofeedbackComponent Analysis Study

Two-group RCT

1. Slow breathing + (fake feedback)2. Slow breathing + HRV visual feedback

Can test if the HRV feedback is useless

175

EMDR Component Analysis Study

Two-group RCT

• EMDR vs. EMDR - EM (11 out of 13 studies)

– exposure + eye = exposure

• PE vs. EMDR

Power and effect size?

176

First DBT Study

• Validity in the first DBT study was criticized because DBT patients received many more hours of therapy than TAU.

• Follow-up data analyses indicated that there was no correlation between number of hours and outcome

• When treatment hours were entered as covariates, DBT still had superior outcomes

• However, treatment hours could still account for why DBT subjects did better simply because the worse people got the most treatment

• Need to covary severity and hours in comparing DBT to TAU

177

DBT Replication Study

Controlled for:1. therapist expertise2. therapist allegiance to treatment provided3. clinical supervision group4. prestige of DBT5. psychotherapy6. treatment affordability and hours7. therapist gender, training/degree, and

clinical experience

178

DBT Component Analysis Study

Three-group RCT

1. DBT (individual + group skills training)2. Individual DBT + activities group3. DBT group skills training + case manag.

179

Cognitive Therapy Mediation Studies

1) CT is based on cognitive theory– thinking causes emotions and behavior

2) Change in CT is associated with cognitive change as hypothesized– concurrent change correlations

180

Component Analysis Study

Cognitive therapy for depression is comprised of cognitive restructuring (CR) and behavioral activation (BA)

• Removing CR does not reduce its effectiveness

• BA is as effective as BA+CR

181


5 interpretations of change process

• BA works better because– thinking is irrelevant– BA is better at changing thinking– it improves environment and thinking

• CR and BA are both effective and redundant – both change thinking– both change environment (reinf + punish)

182


If honey does not improve a sugar-sweetened desert, is honey less tasty?

Three groups are needed

• Could CR be as effective as CR+BA?

183

New Behavior Changes Cognition

“On the one hand, explanations of change processes are becoming more cognitive.

On the other hand, it is performance-based treatments that are proving most powerful in effecting psychological changes. Regardless of the method involved, the treatments implemented through actual performance achieve results consistently superior to those in which fears are eliminated to cognitive representations of threat (Bandura, 1977, p. 78)

184

Activity Scheduling in Cognitive Therapy

Pleasurable activities

• “Nothing is meaningful or worthwhile”

Mastery activities (self-efficacy)

• “I am incapable of doing anything”

Behavioral experiments

• “I am incapable of that”

• “It won’t work out”

185

New thinking prompts new behaviors that lead to more reinforcers and fewer punishers, which

changes depressive affect

186

New behaviors lead to more reinforcers and fewer punishers, which changes belief, which changes

depressive affect

187

Amount of smoking causes cancer which causes lower quality of life

H1: Amount of smoking correlated with QOL

Entire sample has cancer

H2: Exercise moderates the correlation between cancer and QOL.

Analysis of constant variables

188

Do Not Covary your Main Effect

• For Williams Syndrome children, visual-motor skills will predict daily living skills above and beyond intelligence (IQ)– hand-eye coordination is measured by the

WAIS performance tests

• Paul Paris class proposal example

189

Multiple Baseline Design

190

External Validity

is affected by:

• Non-representative samples– requirement of consent– recruiting location/procedures– narrow incentives– exclusion criteria– biased dropouts

• Non-representative procedures/context

191

Race vs. Ethnicity vs. Culture

• race = biology• ethnicity is one aspect of culture• culture is learned• race does not always correspond with

ethnicity• one’s culture is combination of one’s family

culture and mainstream culture (acculturation)

• NIMH categories: Latino is only an ethnicity

192

Internal Validity is Usually a Higher Priority than External Validity

• Ethnic minorities are usually underrepresented and researchers do not make extra effort to recruit them

• Consequence: conclusions often cannot be made about the relevance for ethnic minorities (ethnicity can be a moderator)

193

Sue (1999)

Sue argues that we should make extra effort to recruit ethnic minorities

• to increase generalizability• for social justice

However:• we cannot assume generalizability from a main

effect since ethnicity can be a moderator• it is often not feasible to have enough power to

test moderator effects in a heterogeneous sample• therefore, findings from heterogeneous samples

are often ambiguous

194

Ethical Issues

• informed consent (vs. thoughtless compliance or coercion)

– Milgram and Zimbardo studies– obtained by the therapist

• deception vs. withholding full rationale

• debriefing– mood improvement protocol– verify subjects understand and are back to

normal

• confidentiality vs. anonymity

1 research methods fall 2011. 2 science helps avoid bias biases confound our judgment overconfidence...

Documents