applying signal detection theory to multi-level modeling ...applying signal detection theory to...
TRANSCRIPT
Applying Signal Detection Theory to Multi-Level Modeling:When “Accuracy” Isn't Always
Accurate
December 5th, 2011
Scott Fraundorf & Jason Finley
Outline
• Why do we need SDT?
• Sensitivity vs. Response Bias• Example SDT Analyses
• Terminology & Theory
• Logit & probit• Extensions
Categorical Decisions
• Lots of paradigms involve asking participants to make a categorical decision– Could be explicit or implicit– Here, we're focusing on cases where there
are just two categories … but can generalize to >2
Some Categorical Decisions
Did Anna dressthe baby?(D) Yes (K) No
The cop saw the spy with the binoculars.
“The coach knew(that) you missed
practice.”
Comprehension questions
Assigning a meaning to a novel word like “bouba”
Choosing to include optional words like
“that”
Baby looking at 1 screen or another
Interpreting an ambiguous sentence
Choosing referent in perspective-taking task
Some Categorical Decisions
VIKING(1) Seen (4) New
VIKING(1) Male talker(4) Female talker
Recognition item
memory – did you see this word or face in the study list?
Source memory – was this word said by a male or female talker?
Detecting whether or not a faint
signal is present
Did something change between the two displays?
What is Signal Detection Theory? For experiments with categorical judgments
– Part method for analyzing judgments– Part theory about how people make judgments
Originally developed forpsychophysics– Operators trying to detect
radar signals amidst noise Purpose:
– Better metric properties than ANOVA on proportions(logistic regression has already taken care of this for us)
– Distinguish sensitivity from response bias
Outline
• Why do we need SDT?
• Sensitivity vs. Response Bias• Example SDT Analyses
• Terminology & Theory
• Logit & probit• Extensions
Study:
POTATOSLEEP
RACCOONWITCHNAPKINBINDER
Test:
SLEEP
POTATO
BINDERWITCH
RACCOONNAPKIN
Early recognition memory experiments: Study a list of words, then see the same list again. Circle the ones you remember.
Problem: People might realize that all of these are words they studied! Could circle them all even if they don't really remember them.
Study:
POTATOSLEEP
RACCOONWITCHNAPKINBINDER
Test:
POTATO
HEDGEWITCHBINDER
SHELLSLEEP
MONKEYOATH
Later experiments: Add foils or lures that aren't words you studied.
Here we see someone circled half of the studied words … but they circled half of the lures, too. No real ability to tell apart new & old items. They're just circling 50% of everything.
Study:
POTATOSLEEP
RACCOONWITCHNAPKINBINDER
Test:
POTATO
HEDGEWITCHBINDER
SHELLSLEEP
MONKEYOATH
What we want is a measure that examines correct endorsements relative to incorrect endorsements … and is not influenced by an overall bias to circle things
Sensitivity vs. Response Bias“C is the most common answer in multiple choice exams”Knowing
which answers are
C andwhich aren't
Response bias
Sensitivity (or discrimination)
Sensitivity vs. Response Bias Imagine asking L2 learners of English to
judge grammaticality...
It appears that our participants are more accurate at accepting grammatical items than rejecting ungrammatical ones...
Grammatical condition
Ungrammatical cond.
70%
30%
70%
70%
ACCURACYSAID
“GRAMMATICAL”Group A
Sensitivity vs. Response Bias
But, really, they are just judging all sentences as “grammatical” 70% of the time – a response bias
No evidence they're showing sensitivity to grammaticality here
“Accuracy” can confound these 2 influences
Grammatical condition
Ungrammatical cond.
70%
30%
70%
70%
ACCURACYSAID
“GRAMMATICAL”Group A
Sensitivity vs. Response Bias Now imagine we have speakers of two
different first languages...
Grammatical condition
Ungrammatical cond.
70%
30%
70%
70%
ACCURACYSAID
“GRAMMATICAL”Group A
Group B
Grammatical condition
Ungrammatical cond.
60%
40%
Sensitivity vs. Response Bias It looks like Group B is better at rejecting
ungrammatical sentences... But groups just have different biases
Grammatical condition
Ungrammatical cond.
70%
30%
70%
70%
ACCURACYSAID
“GRAMMATICAL”Group A
Group B
60%
60%Grammatical condition
Ungrammatical cond.
60%
40%
Sensitivity vs. Response Bias This would be particularly misleading if we
only looked at ungrammatical items No way to distinguish response bias vs.
sensitivity in that case!
Ungrammatical cond. 30% 70%ACCURACY
SAID“GRAMMATICAL”
Group A
Group B
60%Ungrammatical cond. 40%
Sensitivity vs. Response Bias We see participants can give the “right”
answer without really knowing it Comparisons to “chance” attempt to deal
with this But “chance” = 50% assumes both
responses equally likely– Probably not true for
e.g., attachment ambiguities– People have overall bias to
answer questions with“yes”
Sensitivity vs. Response Bias Common to balance frequency of intended
responses– e.g. 50% true statements, 50% false
But bias may still exist for other reasons– Prior frequency• e.g. low attachments are more common in English than
high attachments … might create a bias even if they're equally common in your experiment
– Motivational factors (e.g., one error “less bad” than another)
• Better to suggest a healthy patient undergo additional screening for a disease than miss someone w/ disease
Outline
• Why do we need SDT?
• Sensitivity vs. Response Bias• Example SDT Analyses
• Terminology & Theory
• Logit & probit• Extensions
Fraundorf, Watson, & Benjamin (2010)
Both the British and the French biologistshad been searching Malaysia and Indonesiafor the endangered monkeys.
Finally, the British spotted one of themonkeys in Malaysia and planted a radiotag on it.
Presentational orcontrastive pitch accent?
Hear recorded discourse:
Then, later, get true/false memory test
The British scientists spotted theendangered monkey and tagged it.
D TRUE K FALSE
The French scientists spotted theendangered monkey and tagged it.
D TRUE K FALSE
N.B. Actual experiment had multiple types of false probes … an important part of the actual experiment, but not needed for this demonstration
SDT & Multi-Level Models
Traditional logistic regression model:
Accuracy confounds sensitivity and response bias
– Manipulation might just make you say true to everything more
Accuracyof Response
= Probe Type x Pitch Accent
CORRECT MEMORY or INCORRECT MEMORY
SDT & Multi-Level Models
Traditional logistic regression model:
SDT model:
Accuracyof Response
= Probe Type x Pitch Accent
ResponseMade
CORRECT MEMORY or INCORRECT MEMORY
JUDGED GRAMMATICAL or JUDGED UNGRAMMATICAL
JUDGED TRUE vs JUDGED FALSE
= Probe Type x Pitch Accent
SDT model involves changing the way your DV is parameterized.
Respondcorrectly
orRespond
incorrectly?
Truestatement
orFalse
statement?
This better reflects the actual judgment we are asking participants to make.
They are deciding whether to say “this is true” or “this is false” … not whether to respond accurately or respond inaccurately
SDT & Multi-Level Models SDT model:
Said“TRUE”
=
Actually isTRUE
InterceptBaseline rate of responding TRUE.
Does item being true make you more likely to say TRUE?
Response bias
Sensitivity+
w/ centered predictors...
At this point, we haven't looked at any differences between conditions (e.g. contrastive vs presentational accent or L1 vs L2). We are just analyzing overall performance.
SDT & Multi-Level Models SDT model:
Said“TRUE”
=
Actually isTRUE
ContrastiveAccent
Intercept
Accent xTRUE
Baseline rate of responding TRUE.
Does item being true make you more likely to say TRUE?
Does contrastive accent change overall rate of saying TRUE?
Does accent especially increase TRUE responses to true items?
Response bias
Sensitivity
Effect on bias
Effect on sensitivity
+
+
+
w/ centered predictors...
SDT & Multi-Level Models SDT model:
Said“TRUE”
=
Actually isTRUE *
ContrastiveAccent
Intercept
Accent xTRUE *
Baseline rate of responding TRUE.
Does item being true make you more likely to say TRUE?
Does contrastive accent change overall rate of saying TRUE?
Does accent especially increase TRUE responses to true items?
Response bias
Sensitivity
Effect on bias
Effect on sensitivity
+
+
+
Contrastive accent improves actual sensitivity. No effect on response bias.
SDT & Multi-Level Models SDT model:
Said“TRUE”
=
Actually isTRUE *
ContrastiveAccent
Intercept
Accent xTRUE *
Response bias
Sensitivity
Effect on bias
Effect on sensitivity
+
+
+
General heuristic:
Effects that don't interact with item type = effects on bias
Effects that do involve itemtype = effects on sensitvity
Ferreira & Dell (2000)• Are people sensitive to ambiguity in language
production?
• Ambiguous: “The coach knew you....”– “The coach knew you for a long time.”
• Here, you is the direct object of knew
– “The coach knew you missed practice.”• Here, you is actually the subject of an embedded
sentence (you missed practice). Confusing if you were expecting a direct object!
• “The coach knew that you missed practice.”– Including that avoids this ambiguity
Ferreira & Dell (2000)• Task: Read & recall sentences
• Ambiguous: “The coach knew (that) you....”• Unambiguous: “The coach knew (that) I...”
– This has to be a sentential complement (would be The coach knew me if DO)
• Will people produce “that” in the ambiguous conditions?– Especially if task instructions emphasize being
clear?
SDT & Multi-Level Models SDT model:
Said “that”
=
Ambiguity
Instructions
Intercept
Instructionsx Ambiguity
Baseline rate of producing “that”
Do people produce “that” more for you (ambig.) than I (unambig.)?
Are people told to avoid ambiguity?
Do instructions especially increase use of “that” for ambiguous items?
Response bias
Sensitivity
Effect on bias
Effect on sensitivity
+
+
+
SDT & Multi-Level Models SDT model:
Said “that”
=
Ambiguity
InterceptBaseline rate of producing “that”
Do people produce “that” more for you (ambig.) than I (unambig.)?
Response bias
Sensitivity+
People show little sensitivity to ambiguity!
SDT & Multi-Level Models
Ambiguity
Instructions
Intercept
Instructionsx Ambiguity
Baseline rate of producing “that”
Do people produce “that” more for you (ambig.) than I (unambig.)?
Are people told to avoid ambiguity?
Do instructions especially increase use of “that” for ambiguous items?
Response bias
Sensitivity
Effect on bias
Effect on sensitivity
+
+
+
Instructions to be clear don't increase sensitivity to ambiguity.They just increase overall rate of “that,” in all conditions.An interesting response bias effect that tells us about participant's strategy! People try to increase clarity by inserting complementizer everywhere
Other Designs Imagine if critical comprehension questions
should all be answered “yes” Not possible to tease apart sensitivity &
response bias– Could say “yes” 85% of the time because you knew
the correct answer 85% of the time (sensitivity)– But maybe you would just say “yes” to 85% of
everything (response bias)– Like the memory test that only has studied words
Would need “no” probes to do SDT analysis A common design in psycholinguistics … but
limited in the conclusions we can draw from it
Other Designs
This concern holds even if we manipulate some other variable...
The priming manipulation...– Might improve sensitivity at realizing these
questions should be answered “yes”– Might just increase bias to say “yes”
Again, we would need some “no” questions to distinguish these hypotheses
“Yes” comprehension questions – NOT PRIMED
“Yes” comprehension questions – PRIMED
81%
93%
RATE OF “YES” RESPONSES
Outline
• Why do we need SDT?
• Sensitivity vs. Response Bias• Example SDT Analyses
• Terminology & Theory
• Logit & probit• Extensions These slides courtesy
Jason Finley
# Hits# Signal Trials
Hit Rate (HR) =
# False Alarms# Noise Trials
False Alarm Rate (FAR) =
Signal Detection Performance
for each trial:
Hit
Miss
FalseAlarm
Correct Rejection
summary statistics:
(Type II error in stats)
(Type I error)
Theory
NOISEtrials
SIGNALtrials
1. Trials = events2. Strength of evidence: continuous dimension3. Conditional probability distributions for noise, signal4. Decision/response criterion5. Evidence has an arbitrary scale. By convention, noise distribution
has mean 0, variance 1“No” “Yes” Respond “yes” if
evidence above criterionRespond “no” if
evidence below criterion
decisioncriterion
CorrectRejection
NOISEtrials
SIGNALtrials
Noise trial, and evidence is below criterion
decisioncriterion
False Alarm
NOISEtrials
SIGNALtrials
Noise trial, but evidence is above criterion
decisioncriterion
Miss
NOISEtrials
SIGNALtrials
Signal trial, but evidence is below criterion
decisioncriterion
Hit
NOISEtrials
SIGNALtrials
Signal trial, and evidence is above criterion
NOISEtrials
SIGNALtrials
Response Bias
decisioncriterion
Optimal Criterion-signal probability-payoff structure
NOISEtrials
SIGNALtrials
Response Bias
decisioncriterion
High criterion will increase correct rejections, but also increase misses
(The kind of criterion intended in null hypothesis significance testing)
NOISEtrials
SIGNALtrials
Response Bias
decisioncriterion
Low criterion will increase hits,but also false alarms
Good if misses especially bad (medical screening example)
NOISEtrials
SIGNALtrials
Response Bias
decisioncriterion
c = -.5[z(HR) + z(FAR)]
c parameter describes location of criterion
d’
Sensitivity
NOISEtrials
SIGNALtrials
d’ = z(HR) – z(FAR)
Traditional SDT measure of sensitivity is d' … measuring the distance between peaks
Lower Sensitivity
Sensitivityresult of:
-external factors-internal factors
Higher Sensitivity
Sensitivityresult of:
-external factors-internal factors
Outline
• Why do we need SDT?
• Sensitivity vs. Response Bias• Example SDT Analyses
• Terminology & Theory
• Logit & probit• Extensions
Probit vs Logit How to make binomial
response continuous?
Logit = log odds– lnOR = log([p(hit)/p(miss)] /
[p(FA)/p(CR)]) Probit = cumulative
distribution function(CDF) of normal distribution– d' = z[p(hit)] – z[p(FA)] CDF at x is area under curve
from -Inf to point x
Very similar!– Probit changes more quickly in middle of distribution, more
slowly at tails
– Logit has somewhat easier interpretation (can convert to odds / odds ratios)
– Probably, you will get qualitatively similar results with both
– Could try both & see which fits your dataset better
– Literatures differ in which is used more commonly
Figures from http://www.indiana.edu/~statmath/stat/all/cdvm/cdvm1.html
PROBIT LOGIT
Can pick one or the other:– lmer(Y ~ X, family=binomial, link=logit)
• Default, used if you don't specify a link
– lmer(Y ~ X, family=binomial, link=probit)
PROBIT LOGIT
Both the logit and probit are undefined if you have probability 0 or 1 in a cell– Can apply some type of adjustment (e.g.
empirical logit)
PROBIT LOGIT
Outline
• Why do we need SDT?
• Sensitivity vs. Response Bias• Example SDT Analyses
• Terminology & Theory
• Logit & probit• Extensions
Extensions Generalizes to > 2 ordered categories:
– Traditionally, collapse over participants or items & use d
a
– MLM needs multinomial model, currently available in SAS but not R
Extensions Variance in parameters over participants,
items– e.g. different sensitivity, different response bias– Captured by random slopes– Likely that such variance exists!
Extensions Unequal variance
– So far, variability of response is a constant d' = B
0 + B
1X
1 + (1|Subject) + ɛ
– Definitely not true for recognition memory (although lots of debate about why this is)
Noisy criterion (Benjamin, Diaz, & Wee, 2009)
– Typically, all error is in the response (evidence)– But criterion could vary from trial to trial
(from Wixted, 2002)
N.B. This is definitely not the only account of this difference! (see Yonelinas, 2002)