experiment design in psycholinguistics - uni-saarland.demasta/ws15/experiment_design.pdf ·...

Experiment design in Psycholinguistics

November 13th, 2015

Why do Experiments?

• But why experiments in particular?

• Psychology is a broad field, with many methods

• Range from introspection, to observation to controlled experimentation.

Why do Experiments?

• Psycholinguists, unlike traditional linguists, favour controlled experimentation.

• Introspective and observational methods are biased to the researcher’s views.

• With controlled experiments the scientist can isolate the factors under investigation

How do we start?

• Initially, think of a question that may interest you

• How do we process words?

• Is word processing affected by how common a word is?

• Does word frequency affect word processing time?

What do we think?• Next, we need a hypothesis

• What do we expect to find?

• Low frequency words will take longer to process (H1)

• Alternatively, word frequency may not affect processing time (we call this the null-hypothesis, H0)

What do we think?

• Carrying out an experiment is an act of hypothesis-testing.

How do we test this?

• We have to figure out..

• What to compare

• What to measure

• What we may conclude

Practical considerations

• Will we be able to control our variables?

• Will it be plausible given the cost and effort involved?

• Can we make the design ecologically valid?

Control

• What are we going to manipulate?

• Word frequency. In this example, we’ll talk about “high” and “low” frequency words

Control• The variable that we, the experimenters,

manipulate is called the Independent variable

• The levels of an Independent variable can be referred to as conditions.

• The independent variable in our experiment is word frequency and the conditions are high frequency and low frequency

Independent variables

• These can be within-subjects or between-subjects

Example - Alcohol and cognitive function

• If we wanted to see the effect of alcohol on cognitive function we could…

• Give participants some numeracy questions before and after drinking two pints of beer - then compare the scores

• This would be a within-subjects design

• Or….


• We may be worried that exposure to our test influences future performance

• In this case, we would need to use different participants. We would give half two pints, and the other half would abstain.

• This would be a between-subjects design

• Or….


• We may be interested in long-term influence of alcohol

• In this case, we cannot assign people to groups, but use pre-established groups (Heavy drinkers Vs. Teetotallers)

• This would be a between-subjects design and a quasi-experiment.


• Where possible, it is good to keep within-subjects

• So, all participants experience all conditions of an independent variable

• Don’t have to worry about variation between groups

• In our frequency example we will go within-subjects

Dependent Variable

• The dependent variable is the thing that is measured

• And crucially the thing that is compared across conditions

• So, in the case of our alcohol study, in all versions our DV was test-score

Other Variables

• Control variables - things we keep constant

• Random variables - things that will inevitably vary, but not systematically

• Confounding variables - things that may also affect our DV systematically - we want to avoid these

Frequency example

• Back to our example

• Hypothesis: Low frequency words will take longer to process

• Design: Within-subjects

• Independent Variable: Word Frequency (High or Low)

• Dependent Variable: … Depends on task

Lexical Decision task

• Display a word (or nonword) on the screen

• Participants decide if it is a word or not ASAP

• Vary whether the real words are low or high frequency

• Ignore the nonword, and compare RTs between conditions


Word Nonword

Chair


Word Nonword

Haberdashery


Word Nonword

Blint

Lexical Decision Task





• Dependent Variable: Response Time (RT)

Lexical Decision TaskChair Haberdashery Blint

Any confounds?

Word length could conceivably affect processing time, so we must control for this

Lexical Decision TaskChair Knave Blint

Any confounds?

Word length could conceivably affect processing time, so we must control for this


Any other confounds?

There could be order effects - people could get better or worse as time does on.

So items can be counterbalanced and/or randomised


Floor or ceiling effects? Probably not with RT, but if we use accuracy we may get a ceiling effect

Chair Knave Blint


Floor or ceiling effects? Probably not with RT, but if we use accuracy we may get a ceiling effect

If, in our alcohol example we used advanced mathematics questions we may have got a floor effect


Ecologically valid?

Not a lot like natural language, so perhaps we could embed this in a sentence?

Chair Knave Blint

Self-paced Reading

• Use full sentences, but display one word (or a few) at a time

• Participants click to see the next part of the sentence at their own pace

• Vary whether low or high frequency words are used in each sentence

• Compare inspection time on key word (or “spillover” region)

Self-paced Reading





• Dependent Variable: Inspection time on key word (or “spillover” region)

The…

Next

Self-paced Reading

…woman…

Next

Self-paced Reading

…stood…

Next

Self-paced Reading

…up…

Next

Self-paced Reading

…off…

Next

Self-paced Reading

…the…

Next

Self-paced Reading

…chair…

Next

Self-paced Reading

…and…

Next

Self-paced Reading

…walked…

Next

…away.

Next

Self-paced Reading

A full sentence that is more natural, and thus more ecologically valid than the Lexical Decision Task

“The woman stood up off the chair and walked away”

Self-paced Reading


But more to control: each item must make sense with both a low and high frequency word

“The woman stood up off the knave and walked away”

Self-paced Reading


“The woman stood up off the zaisu and walked away”

Self-paced Reading


But more to control: each item must make sense with both a low and high frequency word


Also, counterbalancing is now very important - different materials for different participants.

Self-paced Reading

Ecological validity and experimental control

Ecologically valid, but uncontrolled

Highly controlled, but ecologically

invalid




invalid

Find the sweet spot




invalid

Use a mixture

Even more ecologically valid than Self-paced Reading (no button pressing)

But also much more precise, as you can get information about very rapid eye movements, which participants are

not necessarily aware of

Eye-tracking During Reading

Although, obviously there are practical considerations, namely cost and effort






• Dependent Variable: Loads….

Eye-tracking During Reading• Dependent Variables:….

• First fixation duration = 4

• First-pass duration = 4 + 5

• Regression path duration = 4 + 5 + 6 + 7

• Second-pass duration = 7

• Total fixation duration = 4 + 5 + 7

“…stood up off the zaisu and…” 1 2 6 3 4 7 5 8

Adapted from Koornneef

Comparing conditions

• Any of the methods discussed would allow us to test our hypothesis

• From here we can compare our results between conditions.

• But before we should consider different designs.

• We had one IV, with two levels, so we can compare our two means to find a main effect of our IV (word frequency)

Comparing conditions

• Main effect: The effect of a factor independent of anything else

• Some experiments will have more than one IV, so experimenters can investigate the main effect of both, but also the…

• Interaction: This is when the effect of one IV on the DV depends on another IV

Interactions• If we wanted to see the

effect of alcohol on cognitive function as well as the effect of habitual drinking behaviour we could…

• Split our participants into two groups (Heavy drinkers & Teetotallers), then…

• Give participants some numeracy questions before and after drinking two pints of beer - then compare the scores

• This would be a 2x2 within/between-subjects design

Interactions

Here we see a clear effect of the beer as well as group

But the beer effects both groups in the same way.

Therefore, no interaction

Interactions

Here we see a clear effect of the beer for just the Teetotallers

No effect on Heavy Drinkers

So, we can say there is an interaction between group and drink condition

Interactions

Here we see a clear effect of the beer for the Teetotallers

But, an opposite effect for Heavy Drinkers!

So, we can say there is an interaction between group and drink condition

But can’t just see a main effect or interaction…

…We need statistics!

Statistics

Statistics

That’s us!

Descriptive statistics

• Means, medians, modes and standard deviations

• Displayed in figures and tables


• Mean - sum of values/number of values

• Mode - most commonly occurring value

• Median - the middle value

• Only mean works for us.

Means and distribution

The Mean Deviation

To measure the spread of a dataset it seems sensible to use the ‘deviation’of each data point from the mean of the distribution. The deviation of eachdata point from the mean is simply the data point minus the mean.

small spread = small deviations large spread = large deviations

The Mean Deviation



X1 X2

The Mean Deviation



The Mean Deviation



X1 X2

Inferential statistics

• Is our difference real, or by chance?

• We must carry out statistical tests to test for significance

• From here, we can infer if an effect is “real”

Inferential statistics - comparing means

• If we are comparing two means we can use a t-test

• So we could compare RT for high frequency words to low frequency words in the LDT

• If we are comparing multiple means, we can use an ANOVA to find multiple main effects and interactions

• So, an ANOVA can tell us if our main effects in our alcohol study are significant, and if there is any interactions between our IVs

Significance testing• Significance is usually reported as a p-value

• a p-value is the probability of obtaining a result equal to or "more extreme" than what was actually observed, assuming that the null hypothesis is true.

• A p-value < .05 is considered significant.

• If there is less than a 5% chance that you would find the difference you did from randomly sampled data, we say this is significant.

Hypothesis

• If we have a significant value we can say that we have rejected the null-hypothesis

• And thus support our hypothesis

ExamplesDiscourse Mediated Updating

The woman [will / is too lazy to] put the glass onto the table. Then she will pick up the bottle

and pour the wine carefully into the glass.!! Example visual display taken from Altmann & Kamide (2009). !

Interaction of language and gaze

• Macdonald & Tatler (2015, JEP:HPP)



No gaze, Congruent gaze or Incongruent gaze !


• Hypothesis: Spatial gaze cues will be used more alongside featural referring expressions

• Design: Within-/Between-subjects

• Independent Variables: Gaze condition (present absent or opposite) and referring expression (featural or spatial)

• Dependent Variable: fixation time on face.

Hopefully, this will come in handy for your reviews and presentations of the

empirical papers in later seminars

experiment design in psycholinguistics - uni-saarland.demasta/ws15/experiment_design.pdf ·...

Documents