statistics and rules in language acquisition: constraints and the brain richard n. aslin department...
TRANSCRIPT
Statistics and Rules in Language Acquisition: Constraints and the Brain
Richard N. AslinDepartment of Brain and Cognitive Sciences
University of Rochester
CALACEI Conference, Trieste, Italy
Tools to Study Language Acquisition in Early Infancy
May 6, 2006
Outline
1. What is Statistical Learning (SL)?2. How is SL constrained?3. Neural correlates of visual SL4. Implications of SL for rule learning (RL)
1. What is SL?
• Acquisition of structured information by listening or observing
• No reinforcement or feedback
• Sensitivity to frequency or probability distributions
Why is SL interesting?
• Something like SL must be how language is acquired no instructor
• SL appears to be implausible– Computations involved (infinite # statistics)– Limits of information processing (real-time
flow of input and demands on working memory)
Why word segmentation?
• Tractable problem
• Must be solved early by all language learners (words are defined similarly across languages)
• Illustrative of distributional learning mechanism that may apply more broadly
Sequence of elements: A-B-C-D-E-F-G-H-I-J-K-L . . .
Test triplets: D-E-F vs. I-J-K
Saffran, Aslin & Newport (1996)
Domains and species
• SL operates on human speech and tones (Saffran et al., 1996a,b; 1999), as well as on visual shapes in temporal (Fiser & Aslin, 2002; Kirkham et al., 2002) and spatial domains (Fiser & Aslin, 2001, 2002).
• SL operates in human adults, infants, tamarin monkeys (Hauser et al., 2001, 2004), and rats (Toro & Trobalon, 2005); rats fail higher-order SL.
2. How is SL constrained?
• Gestalt principles– Proximity (Newport & Aslin, 2004; Pena et al, 2002)
– Similarity (Creel, Newport & Aslin, 2004)
– Good continuation (Fiser, Scholl & Aslin, in press)
• Social/attentional cues (Yu, Ballard & Aslin, 2005)
• Preferred units over which statistics are computed (Newport, Weiss, Wonnacott & Aslin, 2004)
• Redundancy reduction (Fiser & Aslin, 2005)
• Primacy (Gebhart, Aslin & Newport, in preparation)
Happy birthday to you
Twinkle twinkle little star
Element similarity
Twinkle twinkle little star
happy birthday to you
A g B h C i...
1.0
1.0
1.0
1.0
0.5
0.5
Creel, Newport & Aslin (2004)
TPs between adjacent tones = 0.5 and 0.25
Same octave
g hi...
...0.5 1.0 1.00.5...
...0.5 1.0 1.0 0.5…
...A BC...Different
octaves
TPs between adjacent tones = 0.5 and 0.25
30.0%
35.0%
40.0%
45.0%
50.0%
55.0%
60.0%
65.0%
70.0%
75.0%
low-low high-low
adjacent1-away
Results
Same Octave Diff Octaves
Are syllables (CV) or segments (C and V) the preferred unit for SL?
• Saffran, Newport & Aslin (1996: adults) and Saffran, Aslin & Newport (1996: infants) assumed that syllable transitional probabilities were the relevant computational unit
• However, BOTH syllable and segment transitional probabilities in our artificial languages would parse the speech streams in the same way
Syllables AND Segments
CV1 CV2 CV3
1.0 1.0 .5
.5 .5
.5 .5.5 .25
Syllables AND Segments
0
4
8
12
16
20
24
28
32
Language A Language B
Languages
# Items Correct
Syllables NOT Segments
CV1 CV2 CV3
1.0 1.0 .5
.5 .5
.5 .5.5 .5
Syllables NOT Segments
0
4
8
12
16
20
24
28
32
Language A Language B
Languages
# Items Correct
What about infants?
• No previous work has examined this question for statistical computations
• But there is a literature on infant perception of segments and syllables– Jusczyk & Derrah: 2 mos old - syllables– Mehler et al.; Jusczyk: development from syllables
segments?– Kuhl, Hillenbrand: 12 mos old - segments (or
acoustic similarity)
Syllables AND Segments
CV1 CV2 CV3
1.0 1.0 .5
.5 .5
.5 .5.5 .25
Infants: Segments AND Syllables
5
7
9
11
A B
Language
Mean Listening Time (sec)
Words
Partwords
Syllables NOT Segments
CV1 CV2 CV3
1.0 1.0 .5
.5 .5
.5 .5.5 .5
Infants: Segments NOT Syllables
5
7
9
11
A B
Language
Mean Listening Time (sec)
Words
Partwords
Infants: Syllables NOT segments
The Statistical ‘Garden Path’
• Two languages with different words and partial overlap of syllables
• Expose to Lang A + Lang B (5 min each)
• No pause between languages
• Post-test: – words vs. partwords in A– words vs. partwords in B
• 5 min of exposure to Lang A or B alone• 5 min of exposure each to Lang A+B
40
50
60
70
80
90
100
A or B A and B
1st
2nd
chance
• Add 30 sec pause between languages• Change pitch of synthetic voice• Triple duration of 2nd language (15 min)
40
50
60
70
80
90
100
A or B A + B Pause Pitch 2nd 15min
1st
2nd
chance
• Eliminate syllable differences (all identical)– 5 min exposure and test Lang A or B alone– Test for word vs. partword in each language
40
50
60
70
80
90
100
A or B A + B
1st
2nd
chance
Primacy: learning first structure ‘blocks’ new structure
3. Neural correlates of SL
• Statistical learning in the visual modality: spatial structure, not temporal structure
• How are higher-order visual features represented in the brain?– Hemisphere bias in SL and interhemispheric
transfer– fMRI activations of brain regions during SL
Background
Can mere exposure to a series of scenes enable adult learners to extract features defined by shape-conjunctions?
(Fiser & Aslin, 2001)
Six base-pairs
Fit three base-pairs into 3 X 3 grid
Testing phase
• 2AFC task• Base-pair vs. Non-base pair E
F
I J
A
B
A
B
Base-pair
70% correct
IF
Non-base pair
Split the base-pairs
Fiser, Roser, Aslin & Gazzaniga (in prep)
GHC
D
K L
A
B
F
E
I J
2 deg
Modified test phase
Ipsilateral:
• Practice: RH Test: RH• Practice: LH Test: LH
Contralateral:
• Practice: RH Test: LH• Practice: LH Test: RH
Four lateralized test types
Subjects
• Normal subjects: Sixteen college students
• Callosotomy patient: V.P.
(Corballis et al. Neurology 2001)
Results with normal subjects
Equal learning in all conditions interhemispheric transfer
40
50
60
70
80
90
100
40
50
60
70
80
90
100
Practice: RH LH LH RH
Test: RH LH RH LH
Normal
Ipsilateral Contralateral
Chance
40
50
60
70
80
90
100
40
50
60
70
80
90
100
Practice: RH LH LH RH
Test: RH LH RH LH
Normal
Split brain
Ipsilateral Contralateral
•Contralateral: No interhemispheric
information transfer•Ipsilateral: Strong right hemisphere advantage
*
Chance
Results with the split brain patient
Event-Related fMRI Design – LEARNING PHASE
+
2500
Baseline fix (4 TRs)
+
1000 2500/5000/7500
++
Stimulus Jitter Trials
+
2500 2500/5000/75000
++
Stimulus Jitter Trials
Instructions
144 Stimuli each presented once – Divided into 3 Runs of 6 min each
TEST PHASE
+
2500
Baseline fix
(4 TRs)
+
1000 2500/5000/7500
++
Stimulus
+ Response
Jitter Trials
+
2500 2500/5000/75000
++
Stimulus
+ Response
Jitter Trials
Base Pair Non Base Pair
48 test trials: 24 base-pairs, 24 non base-pairs yes/no familiarity task
Learning Phase: final 1/3 vs. initial 1/3
Right Parietal Activation
Consistent with split-brain findings
4. Implications of SL for RL
• Generalization to new tokens: Rule-learning– Gomez & Gerken (1999)– Marcus et al. (1999)– Pena et al. (2002)– Saffran & Wilson (2003)
• Not based on perceptual similarity• Could be based on surrounding context
(Mintz, 2003) and on category variability (Gomez, 2002; Gomez & Maye, 2005
What enables RL?
• Obtained with strings, not streams• Pauses enable encoding of position info• High variability in a sea of stability may
induce categories by down-weighting the category exemplars and then enabling their differences to be learned after “frequent frames” (Mintz, 2003; Santelmann & Jusczyk, 1998) are established
RL vs. SL: Different mechanism?
• RL operates over categories rather than over surface forms.
• Computation of statistics over categories may involve the same SL mechanism as computation over surface forms only a difference in input?
• RL in tamarins (Hauser, Weiss & Marcus, 2002) suggests that RL is not unique to language learning.
Conclusions
• Statistical learning is ubiquitous and powerful.• SL must be constrained to operate efficiently
and to extract the “right” structure.• The search for neural correlates of SL is
ongoing.• Whether SL can also operate at the level of
categories or whether RL involves a separate mechanism remains unclear.
Thanks to my collaborators and funding sources
Elissa Newport
Jenny Saffran
Jozsef Fiser
Andrea Gebhart
Sarah Creel
Matt Roser
Mike Gazzaniga
NIH, Packard Foundation, McDonnell Foundation
Blank
Why conditionalized statistics?
• Element frequency (N-gram) is a poor predictor of underlying structure.– Many high frequency sounds appear in
multiple contexts– Conditional probabilities are computable by
adults and infants (and in classical conditioning by rats, but not in speech)
• But element frequency can serve as an “anchor” or a “filter” on how SL operates.
With what fidelity?
• How much input is needed to compute the relevant statistic(s)?– Brent & Siskind (2001)– Mintz, Newport & Bever (2002)
• What decision mechanism operates on those stored statistical values?– Local minimum vs. hard threshold– How many bits of resolution? Is a transitional
probability difference of 0.43 > 0.39 relevant?
Are SL studies just “toy” demos?
• Saffran et al. used simple structures
DU TA BA PA TU BI TU TI BU PI DABU BA BU PU BU PA DADU TA BA0.0
0.2
0.4
0.6
0.8
1.0
Transitional Probabilities
• Swingley (2005) showed that similar structures are present in IDS.
Which unit?
• Saffran, Aslin & Newport (1996) presumed the unit was the syllable.
• Newport et al. (BU: 2004) showed that SL in speech streams is computed over segments (Cs & Vs), not syllables.
• Other cues are clearly important:Saffran et al. (1996): Although experience with speech in the real world is unlikely to be as concentrated as it was in these studies, infants in more natural settings presumably benefit from other types of cues correlated with statistical information.
Fiser, Scholl & Aslin (in press)
Bouncing vs. streaming
Perception of bouncing or streaming
biases statistical learning
“streaming”
3. What are the limits of SL?
• Some minimal “attention” is required.– Saffran et al. (1997)– Turke-Brown, Junge & Scholl (2005)– Toro, Sinnett & Soto-Faraco (in press)
• In streams of syllables, non-adjacent learning is difficult.– Newport & Aslin (2004)– Pena et al. (2002)
• Unfamiliar elements (noises) are hard to learn.– Gebhart, Newport & Aslin (2004)
Test phase: correct – incorrect