experiment design in psycholinguistics - uni-saarland.demasta/ws15/experiment_design.pdf ·...
TRANSCRIPT
Experiment design in Psycholinguistics
November 13th, 2015
Why do Experiments?
• But why experiments in particular?
• Psychology is a broad field, with many methods
• Range from introspection, to observation to controlled experimentation.
Why do Experiments?
• Psycholinguists, unlike traditional linguists, favour controlled experimentation.
• Introspective and observational methods are biased to the researcher’s views.
• With controlled experiments the scientist can isolate the factors under investigation
How do we start?
• Initially, think of a question that may interest you
• How do we process words?
• Is word processing affected by how common a word is?
• Does word frequency affect word processing time?
What do we think?• Next, we need a hypothesis
• What do we expect to find?
• Low frequency words will take longer to process (H1)
• Alternatively, word frequency may not affect processing time (we call this the null-hypothesis, H0)
What do we think?
• Carrying out an experiment is an act of hypothesis-testing.
How do we test this?
• We have to figure out..
• What to compare
• What to measure
• What we may conclude
Practical considerations
• Will we be able to control our variables?
• Will it be plausible given the cost and effort involved?
• Can we make the design ecologically valid?
Control
• What are we going to manipulate?
• Word frequency. In this example, we’ll talk about “high” and “low” frequency words
Control• The variable that we, the experimenters,
manipulate is called the Independent variable
• The levels of an Independent variable can be referred to as conditions.
• The independent variable in our experiment is word frequency and the conditions are high frequency and low frequency
Independent variables
• These can be within-subjects or between-subjects
Example - Alcohol and cognitive function
• If we wanted to see the effect of alcohol on cognitive function we could…
• Give participants some numeracy questions before and after drinking two pints of beer - then compare the scores
• This would be a within-subjects design
• Or….
Example - Alcohol and cognitive function
• We may be worried that exposure to our test influences future performance
• In this case, we would need to use different participants. We would give half two pints, and the other half would abstain.
• This would be a between-subjects design
• Or….
Example - Alcohol and cognitive function
• We may be interested in long-term influence of alcohol
• In this case, we cannot assign people to groups, but use pre-established groups (Heavy drinkers Vs. Teetotallers)
• This would be a between-subjects design and a quasi-experiment.
Example - Alcohol and cognitive function
• Where possible, it is good to keep within-subjects
• So, all participants experience all conditions of an independent variable
• Don’t have to worry about variation between groups
• In our frequency example we will go within-subjects
Dependent Variable
• The dependent variable is the thing that is measured
• And crucially the thing that is compared across conditions
• So, in the case of our alcohol study, in all versions our DV was test-score
Other Variables
• Control variables - things we keep constant
• Random variables - things that will inevitably vary, but not systematically
• Confounding variables - things that may also affect our DV systematically - we want to avoid these
Frequency example
• Back to our example
• Hypothesis: Low frequency words will take longer to process
• Design: Within-subjects
• Independent Variable: Word Frequency (High or Low)
• Dependent Variable: … Depends on task
Lexical Decision task
• Display a word (or nonword) on the screen
• Participants decide if it is a word or not ASAP
• Vary whether the real words are low or high frequency
• Ignore the nonword, and compare RTs between conditions
Lexical Decision task
Word Nonword
Chair
Lexical Decision task
Word Nonword
Chair
Lexical Decision task
Word Nonword
Haberdashery
Lexical Decision task
Word Nonword
Haberdashery
Lexical Decision task
Word Nonword
Blint
Lexical Decision task
Word Nonword
Blint
Lexical Decision Task
• Back to our example
• Hypothesis: Low frequency words will take longer to process
• Design: Within-subjects
• Independent Variable: Word Frequency (High or Low)
• Dependent Variable: Response Time (RT)
Lexical Decision TaskChair Haberdashery Blint
Any confounds?
Word length could conceivably affect processing time, so we must control for this
Lexical Decision TaskChair Knave Blint
Any confounds?
Word length could conceivably affect processing time, so we must control for this
Lexical Decision TaskChair Knave Blint
Lexical Decision TaskChair Knave Blint
Any other confounds?
There could be order effects - people could get better or worse as time does on.
So items can be counterbalanced and/or randomised
Lexical Decision Task
Floor or ceiling effects? Probably not with RT, but if we use accuracy we may get a ceiling effect
Chair Knave Blint
Lexical Decision Task
Floor or ceiling effects? Probably not with RT, but if we use accuracy we may get a ceiling effect
If, in our alcohol example we used advanced mathematics questions we may have got a floor effect
Lexical Decision Task
Ecologically valid?
Not a lot like natural language, so perhaps we could embed this in a sentence?
Chair Knave Blint
Self-paced Reading
• Use full sentences, but display one word (or a few) at a time
• Participants click to see the next part of the sentence at their own pace
• Vary whether low or high frequency words are used in each sentence
• Compare inspection time on key word (or “spillover” region)
Self-paced Reading
• Back to our example
• Hypothesis: Low frequency words will take longer to process
• Design: Within-subjects
• Independent Variable: Word Frequency (High or Low)
• Dependent Variable: Inspection time on key word (or “spillover” region)
The…
Next
Self-paced Reading
…woman…
Next
Self-paced Reading
…stood…
Next
Self-paced Reading
…up…
Next
Self-paced Reading
…off…
Next
Self-paced Reading
…the…
Next
Self-paced Reading
…chair…
Next
Self-paced Reading
…and…
Next
Self-paced Reading
…walked…
Next
…away.
Next
Self-paced Reading
A full sentence that is more natural, and thus more ecologically valid than the Lexical Decision Task
“The woman stood up off the chair and walked away”
Self-paced Reading
“The woman stood up off the chair and walked away”
But more to control: each item must make sense with both a low and high frequency word
“The woman stood up off the knave and walked away”
Self-paced Reading
“The woman stood up off the chair and walked away”
“The woman stood up off the zaisu and walked away”
Self-paced Reading
“The woman stood up off the chair and walked away”
“The woman stood up off the zaisu and walked away”
Self-paced Reading
“The woman stood up off the chair and walked away”
But more to control: each item must make sense with both a low and high frequency word
“The woman stood up off the zaisu and walked away”
Also, counterbalancing is now very important - different materials for different participants.
Self-paced Reading
Ecological validity and experimental control
Ecologically valid, but uncontrolled
Highly controlled, but ecologically
invalid
Ecological validity and experimental control
Ecologically valid, but uncontrolled
Highly controlled, but ecologically
invalid
Find the sweet spot
Ecological validity and experimental control
Ecologically valid, but uncontrolled
Highly controlled, but ecologically
invalid
Use a mixture
Even more ecologically valid than Self-paced Reading (no button pressing)
But also much more precise, as you can get information about very rapid eye movements, which participants are
not necessarily aware of
Eye-tracking During Reading
Although, obviously there are practical considerations, namely cost and effort
Eye-tracking During Reading
Eye-tracking During Reading
“The woman stood up off the chair and walked away”
Eye-tracking During Reading
“The woman stood up off the zaisu and walked away”
Eye-tracking During Reading
• Back to our example
• Hypothesis: Low frequency words will take longer to process
• Design: Within-subjects
• Independent Variable: Word Frequency (High or Low)
• Dependent Variable: Loads….
Eye-tracking During Reading• Dependent Variables:….
• First fixation duration = 4
• First-pass duration = 4 + 5
• Regression path duration = 4 + 5 + 6 + 7
• Second-pass duration = 7
• Total fixation duration = 4 + 5 + 7
“…stood up off the zaisu and…” 1 2 6 3 4 7 5 8
Adapted from Koornneef
Comparing conditions
• Any of the methods discussed would allow us to test our hypothesis
• From here we can compare our results between conditions.
• But before we should consider different designs.
• We had one IV, with two levels, so we can compare our two means to find a main effect of our IV (word frequency)
Comparing conditions
• Main effect: The effect of a factor independent of anything else
• Some experiments will have more than one IV, so experimenters can investigate the main effect of both, but also the…
• Interaction: This is when the effect of one IV on the DV depends on another IV
Interactions• If we wanted to see the
effect of alcohol on cognitive function as well as the effect of habitual drinking behaviour we could…
• Split our participants into two groups (Heavy drinkers & Teetotallers), then…
• Give participants some numeracy questions before and after drinking two pints of beer - then compare the scores
• This would be a 2x2 within/between-subjects design
Interactions
Here we see a clear effect of the beer as well as group
But the beer effects both groups in the same way.
Therefore, no interaction
Interactions
Here we see a clear effect of the beer for just the Teetotallers
No effect on Heavy Drinkers
So, we can say there is an interaction between group and drink condition
Interactions
Here we see a clear effect of the beer for the Teetotallers
But, an opposite effect for Heavy Drinkers!
So, we can say there is an interaction between group and drink condition
But can’t just see a main effect or interaction…
…We need statistics!
Statistics
Statistics
That’s us!
Descriptive statistics
• Means, medians, modes and standard deviations
• Displayed in figures and tables
Descriptive statistics
• Mean - sum of values/number of values
• Mode - most commonly occurring value
• Median - the middle value
• Only mean works for us.
Descriptive statistics
• Mean - sum of values/number of values
• Mode - most commonly occurring value
• Median - the middle value
• Only mean works for us.
Descriptive statistics
Descriptive statistics
Means and distribution
The Mean Deviation
To measure the spread of a dataset it seems sensible to use the ‘deviation’of each data point from the mean of the distribution. The deviation of eachdata point from the mean is simply the data point minus the mean.
small spread = small deviations large spread = large deviations
The Mean Deviation
To measure the spread of a dataset it seems sensible to use the ‘deviation’of each data point from the mean of the distribution. The deviation of eachdata point from the mean is simply the data point minus the mean.
small spread = small deviations large spread = large deviations
X1 X2
The Mean Deviation
To measure the spread of a dataset it seems sensible to use the ‘deviation’of each data point from the mean of the distribution. The deviation of eachdata point from the mean is simply the data point minus the mean.
small spread = small deviations large spread = large deviations
The Mean Deviation
To measure the spread of a dataset it seems sensible to use the ‘deviation’of each data point from the mean of the distribution. The deviation of eachdata point from the mean is simply the data point minus the mean.
small spread = small deviations large spread = large deviations
X1 X2
Inferential statistics
• Is our difference real, or by chance?
• We must carry out statistical tests to test for significance
• From here, we can infer if an effect is “real”
Inferential statistics - comparing means
• If we are comparing two means we can use a t-test
• So we could compare RT for high frequency words to low frequency words in the LDT
• If we are comparing multiple means, we can use an ANOVA to find multiple main effects and interactions
• So, an ANOVA can tell us if our main effects in our alcohol study are significant, and if there is any interactions between our IVs
Significance testing• Significance is usually reported as a p-value
• a p-value is the probability of obtaining a result equal to or "more extreme" than what was actually observed, assuming that the null hypothesis is true.
• A p-value < .05 is considered significant.
• If there is less than a 5% chance that you would find the difference you did from randomly sampled data, we say this is significant.
Hypothesis
• If we have a significant value we can say that we have rejected the null-hypothesis
• And thus support our hypothesis
ExamplesDiscourse Mediated Updating
The woman [will / is too lazy to] put the glass onto the table. Then she will pick up the bottle
and pour the wine carefully into the glass.!! Example visual display taken from Altmann & Kamide (2009). !
Interaction of language and gaze
• Macdonald & Tatler (2015, JEP:HPP)
Interaction of language and gaze
• Macdonald & Tatler (2015, JEP:HPP)
Interaction of language and gaze
• Macdonald & Tatler (2015, JEP:HPP)
Interaction of language and gaze
• Macdonald & Tatler (2015, JEP:HPP)
No gaze, Congruent gaze or Incongruent gaze !
Interaction of language and gaze
• Macdonald & Tatler (2015, JEP:HPP)
• Macdonald & Tatler (2015, JEP:HPP)
• Hypothesis: Spatial gaze cues will be used more alongside featural referring expressions
• Design: Within-/Between-subjects
• Independent Variables: Gaze condition (present absent or opposite) and referring expression (featural or spatial)
• Dependent Variable: fixation time on face.
Interaction of language and gaze
• Macdonald & Tatler (2015, JEP:HPP)
Hopefully, this will come in handy for your reviews and presentations of the
empirical papers in later seminars