statistical frequency in word segmentation. words don’t come with nice clean boundaries between...

30
Statistical Frequency in Word Segmentation

Upload: kelly-richardson

Post on 17-Dec-2015

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Statistical Frequency in Word Segmentation

Page 2: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Words don’t come with nice clean boundaries between them

• Where are the word boundaries?

Page 3: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Question: How do children work out where the word boundaries are?

- Statistical regularities

There are several potential clues:

- Pauses (although this is dubious)

- Intonation (this too is dubious)

Page 4: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Statistical Regularities

• Words very rarely begin with [dw],

• Words never begin with [bn],

• Words never begin with [lb],

• Etc.

• So if the child hears these sequences, the child hypothesizes the sequence occurred in the middle or at the end of the word.

Page 5: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Statistical Regularities

• Voiceless stops that begin words are almost always aspirated,

• Voiced segments that end words are often de-voiced,

• Various other phonological processes may occur, e.g., word-final frication, etc.

• So these are phonological clues that may help segment the speech stream.

Page 6: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Problem

• In order for children to be able to make use of these cues, they must be able to track the frequency of such items in the speech, otherwise it is a useless cue.

• So if the child is not able to track the frequency of [bn] at the beginning of words, what use is using this strategy?

Page 7: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Statistical Tracking

• Very recent work suggests that children do in fact have the capacity to track statistical frequencies of certain elements in their environment.

• Major researchers: Jenny Saffran (Wisconsin), Rebecca Gomez (Arizona), Elisa Newport (Rochester), Richard Aslin (Rochester), LouAnn Gerken (Arizona), Gary Marcus (NYU), etc.

Page 8: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

The Experiment - Overview• Create a synthesized string of syllables that

occur in a particular frequency (can’t use English…).

• Expose the children to this string of syllables for ~20 minutes.

• Test children to see if they have a preference for the highly frequent syllable sets or the rare syllable sets.

• If children show a preference (no matter what direction that preference is in), then children are sensitive to frequencies of syllables in the input.

Page 9: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Sample StimulusTheir language consisted of:

• Four consonants (p,t,b,d)

• Three vowels (a, i, u)

• Which when combined created 12 syllables (pa, ti, bu, da, etc.).

• These then created six words:

• babupu, bupada, dutaba, patubi, pidabu, and tutibu

Page 10: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

ba bu pu

bu pa da

du ta ba

pa tu bi

pi da bu

tu ti bu

bibu

papi

ba

pu

tatitu

dadidu

2

14211

112

201

bupubupa

padaduta

babu

taba

patutubipida

dabututitibu

1

11111

111

111

Page 11: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Transitional Probabilities• The chances of a word containing bu are

much greater than the chances of a word containing di.

• Transitional probabilities quantify this.

• The Transitional Probability of xy is:

xy

x

Page 12: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Transitional Probabilities

• So for the word babupu, the transitional probability of babu is calculated as follows:

Frequency of babu / Frequency of ba

1/2 = 0.5

Frequency of bupu / Frequency of bu

1/4 = 0.25

Overall transitional probability of the word babupu = (0.5+0.25) / 2 = 0.375

Page 13: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

What’s the point?

• Transitional probability was manipulated so that:

• The transitional probability was high within a word, but low across a word boundary. This is what a word IS in real life.

Page 14: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

ba bu pu bu pa da du ta ba

High Transitional Probability

High Transitional Probability

Low Transitional Probability

High Transitional Probability

High Transitional Probability

Low Transitional Probability

High Transitional Probability

High Transitional Probability

Page 15: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

• 300 tokens of each of the six words were randomly concatenated.

• All word boundaries were removed

• This left 4536 continuous syllables, which were read by a speech synthesizer.

• Synthesizer produced a monotone of syllables at a rate of 216 syllables per minute.

Page 16: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Procedures

• Subjects consisted of 24 undergraduate students.

• Subjects were told to listen to ‘nonsense’ language.

• Task is to figure out where words begin/end.

• After 3 blocks of 7 minutes of exposure to the language, subjects were tested.

Page 17: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

• Subjects heard two tri-syllabic strings, e.g.,

Test Procedure

bu-pa-da and pi-da-bu

Real word Not a real word

Which sounds more like a word from this nonsense language?

36 trials in the test.

Page 18: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Results

• Mean score correct for all subjects was 27.2, where chance is 18. t-test shows this to be statistically significantly different from chance.

• Conclusion: adults are able to recognize what is a word and what is not a word based purely on statistical frequency.

Page 19: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Additional finding:

• the three words with the most common syllables in them were easiest to recognize.

• the three words with the least common syllables in them were hardest to recognize.

Page 20: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

But can kids do this too?

• Answer appears to be Yes.• Saffran et al. (1996) used essentially the

same stimuli on 8 month old children

• Used four strings of words instead of six.

• Children were exposed for only 2 minutes (not 21 minutes)

Page 21: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Child

Methodology

• Head turning Procedure

speakers

light

Page 22: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Results

• Children looked statistically longer at the speaker from which novel words were being produced.

• Why is this? Why wouldn’t they look longer at the speaker from which familiar words are being produced?

Page 23: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Bottom Line

• Children have the ability to track transitional probabilities of sounds on the basis of very little exposure.

• This is therefore how words are parsed.

Page 24: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Tool against Nativism…?

• This has recently been the most prolific weapon against the idea that children use innate knowledge to acquire language.

• If children are using such sophisticated skills to segment words, why can’t they use similar (non-linguistic) skills to learn syntax?

Page 25: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

But it isn’t so simple

• Marcus et al. (1999) trained children on sentences of the following sort:

• la – ta – la

• ga – na – ga

• da – ba – da

• x – y – x

Page 26: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

And tested them on:

• wo – fe – wo

• gi – tu – gi

• po – zi – po

Namely, words with:-new syllables, but-the same structure (x-y-x)

And…

• wo – fe – fe

• gi – tu – tu

• po – zi – zi

Namely, words with:-new syllables, and-new structure (x-y-y)

Page 27: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Results

• Children appear to recognize the difference between these sets of stimuli

Children are therefore tracking structure and not just simple statistics.

Page 28: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Questions to ask yourself:• Why would statistical tracking be useful to

linguists?As a tool to explain language acquisition.• Does statistical tracking explain how

children acquire language?

• What aspects of language can we track?

No, only certain aspects of it.

So far, it appears only phonologically related things can be tracked like this (not meaning-related things).

Page 29: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Most Important Questions

• Is this useful for ALL languages on Earth?

It appears that statistical tracking is only useful for auditory stimuli, not visual…ASL?

• Are humans the only creatures that can do this? (I hope so, otherwise other animals should have language too…)

No. Vervet and Tamarin monkeys have been shown to have essentially the same abilities that humans do.

Page 30: Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

So what do we really know?

• Kids have spectacular abilities to track statistics.

• But so do adults (so why can’t adults learn languages as well as kids?)

• But so do monkeys (so why can’t monkeys learn language as well as humans?)

• This ability appears to be limited to statistics in auditory perception.