gradience and similarity in sound, word, phrase and meaning jay mcclelland stanford university
TRANSCRIPT
Gradience and Similarity in Sound, Word, Phraseand Meaning
Jay McClelland
Stanford University
Collaborators
Dave Rumelhart Mark Seidenberg Dave Plaut Karalyn Patterson Matt Lambon Ralph Cathy Harris Gary Lupyan Lori Holt Brent Vander Wyk Joan Bybee
The Compositional View of Language (Fodor and Pylyshyn, 1988) Linguistic objects may be atoms or
more complex structures like molecules.
Molecules consist of combinations of atoms that are consistent with structural rules.
Mappings between form and meaning depend on structure-sensitive rules.
This allows languages to be combinatorial, productive, and systematic.
[ John [ hit [the ball] ] ] [ [w [ ei [t]]] [^d]
S NP, VPVP V NP; NP …
word stem+affix stem {syl}+syl’+{syl} syl {onset} + rhyme
rhyme nuc + {coda}
Subj Agent Verb Action Obj Patient
Vi+past stemi + [^d]
Critique
The number of units present in an expression is not always clear
The number of different categories of units is not at all clear
Real native ‘idiomatic’ language ability involves many subtle patterns not easily captured by rules
There is no generally accepted framework for characterizing how rules work
How many mountains?
There is less discreteness in some cases than others
And more in some domains than to others
Some cases in language where it is hard to decide on the number of units How many words?
Cut out, cut up, cut over; cut it out? Barstool, shipmate; another, a whole nother
How many morphemes? Pretend, prefer, predict, prefabricate Chocoholic, chicketarian Strength, length; health, wealth; dearth, filth
How many syllables? Every, memory, livery; leveling, shoveling; evening…
How many phonemes? Teach, boy, hint, swiftly, softly Memory, different What happened to you?
Cases in which it is unclear how many types of units are needed Object types:
Species California redwoods Butterflies along a
mountain range Types of tomatoes
Restaurants Japanese Italian Seafood
Linguistic types Word meanings
ball run
Segment types fuse, fusion dirt, dirty (cf. sturdy)
Characterizations of how rules work
Rule or exception (Pinker et al) V + past Stem + /^d/ gowent; dig dug;
keep kept; say said General and specific rules (Halle, Marantz)
V + past Stem +/^d/ if stem ends in ‘eep’: ‘ee’ ‘eh’ if stem = say: ‘ay’ ‘eh’
Output oriented approaches OT: e.g. ‘No Coda’ Bybee’s output oriented past tense schemas
A lax vowel followed by a dental, as in hit, cut, bid, waited ‘ah’ or ‘uh’ followed by a (preferably nasalized) velar
as in (sang, flung, dug …)
How do the general and the specific work together? Past Tenses
likeliked but keepkeptpaypaid but saysaid
English spellingsound mappingmint, hint, … but pintsave, wave, … but have
Meanings of sentencesJohn saw a dogJohn saw a doctor
Can the contexts of application of the more specific patterns be well defined? For the past tense
Generally, words with more complex rhymes will be more susceptible to reduction *VV[S]t where [S] stands for stop consonant
Item frequency and number of other similar items both appear to contribute
For spelling to sound Sources of spelling are lost in history But item frequency and similar neighbors play important roles
For constructions Characterization of constraints is generally relatively vague and seems to be a matter
of degree Subj: Human V: saw Obj: Professional ‘Paid a visit to’ John saw an accountant John saw an architect The baby saw a doctor The boy saw a doctor
Perhaps similarity to neighbors plays an important role here as well
Summary
Linguistic objects vary continuously in their degree of compositionality and in their degree of systematicity
While some forms seem highly compositional and some forms seem highly regular/systematic, there is generally a detectable degree of specificity in every familiar form (Goldberg)
Even nonce forms reflect specific effects of specific ‘neighbors’
It may be useful to adopt the notion that language consists of tokens selected from a specified taxonomy of units and that linguistic mappings are determined by systems of rules…
BUT, an exact characterization is not possible in this framework
Units and rules are meta-linguistic constructs which do not play a role in language processing, language use or language acquisition.
These constructs impede understanding of language change
What will the alternative look like?
It will be a system that allows continuous patterns over time
-- articulatory gestures and auditory waveforms
to generate graded and distributed internal representations that capture linguistic structure and mappings
in ways that respect both
the continuous and discrete aspects of linguistic structure
without enumeration of unitsexplicit representation of rules
Using neural network models to capture these ideas
Units in Neural Network Models
Many neural network models rely on distributed internal representations in which there is no discrete representation of linguistic units.
To date most of these models have adopted some sort of concession to units in their inputs and outputs.
We do this because we have not yet achieved the ability to avoid doing so, not because we believe these units exist
A Connectionist Model of Word Reading (Plaut, McC, Seidenberg & Patterson, 1996)
Task is to learn to map spelling to sound, given spelling-sound pairs from 3000 word corpus.
Network learns gradually from frequency weighted exposure to pairs in the corpus.
For each presentation of each item: Input units corresponding to spelling
are activated. Processing occurs through
propagation of activation from input units through hidden units to output units, via weighted connections.
Output is compared to the item’s pronunciation.
Small adjustments to connections are made to reduce difference.
M I N T
/m/ /I/ /n/ /t/
Aspects of the Connectionist Model
Mapping through hidden units forces network to use overlapping internal representations.
-Allows sensitivity to combinations if necessary
-Yet tends to preserve overlap based on similarity
Connections used by different words with shared letters overlap, so what is learned tends to transfer across items.
M I N T
/m/ /I/ /n/ /t/
Processing Regular Items: MINT and MINE
Across the vocabulary, consistent co-occurrence of M with /m/, regardless of other letters, leads to weights linking M to /m/ by way of the hidden units.
The same thing happens with the other consonants, and most consonants in other words.
For the Vowel I: If there’s a final E produce /ai/ Otherwise produce /I/
M I N T
/m/ /I/ /n/ /t/
Processing an Exception: PINT
Because PINT overlaps with MINT, there’s transfer Positive for N -> /n/ and T -> /t/ Negative for I -> /ai/
Of course P benefits from learning with PINK, PINE, POST, etc.
Knowledge of regular patterns is hard at work in processing this and all other exceptions.
The only special thing the network needs to learn is what to do with the vowel.
Even this will benefit from weights acquired from cases such as MIND, FIND, PINE, etc.
P I N T
/p/ /ai/ /n/ /t/
Model captures patterns associated with ‘units’ of different scopes without explicitly representing them.
The model learns basic regular correspondences, generalizes appropriately to non-words. mint, rint; seat, reat; rave, mave…
It learns to produce the correct output for all exceptions in the corpus. pint, bread, have, etc…
It is sensitive to sub-regularities such as special vowels with certain word-final clusters, c-conditioning, final-e conditioning… sold, nold; book, grook;
plead, tread, ?klead bake, dake; rage, dage / rice, bice
Shows graded sensitivity modulated by frequency to item-specific, rhyme-specific, and context-sensitive correspondences.
pint
bread
hint
dent
High LowFrequency
Err
or /
Set
tling
Tim
e
How does it work? Correspondences of different
scopes are represented in the connections between the input and the output that depends on them.
Some correspondences, e.g. in the word-initial consonant cluster, are highly compositional, and the model treats them this way.
Others, such as those involving the pronunciation of the vowel, are highly dependent on context, but to a degree that varies by with the type of item.
Elman’s Simple Recurrent Network
Finds larger units with coherent internal structure from time series of inputs.
Series are usually discretized at conventional linguistic unit boundaries, but this is just for simplicity.
Uses hidden unit state from processing of previous input as context for next input.
Elman networks learn syntactic categories from word sequences
Elman (1991) Explored Long-DistanceDependencies
NV Agreement and Verb successor prediction
Swho
Vp
Vs
N
Swho
Vp
Vs
N
Swho
Vp
Vs
N
Swho
Vp
Vs
N
Swho
Vp
Vs
N
Prediction withan embedded clause
Swho
Vp
Vs
N
Swho
Vp
Vs
N
Swho
Vp
Vs
N
Swho
Vp
Vs
N
Swho
Vp
Vs
N
Swho
Vp
Vs
N
Attractor Neural Networks
Advantages Discreteness as well as
continuity Captures general and
specific in a single network for semantic as well as spelling-sound regularity
General information is learned faster and is more robust to damage, capturing development and learning
Adding context would allow context to shade or select meaning
Can we do without units on the input and the output? I think it will be crucial to do so because speech
gestures are continuous.They have attractor-like characteristics but
also vary continuously in many ways and as a function of a wide range of factors
It will then be entirely up to the characteristics of the processing system to exhibit the relevant partitioning into units
Keidel’s model that learns to translate from continuous spoken input to articulatory parameters.
The input to the model is a time series of auditory parameters from actual spoken CV syllables.
Output is the identity of the C and the V, but…
It should be possible to translate from auditory input to the continuous articulatory movements that would ‘imitate’ the input. An important future direction
Units and Rules as Emergents
In all three example models, units and rules are emergent properties that admit of matters of degree.
We can choose to talk about such things as though they have an independent existence for descriptive convenience but they may have no separate mechanistic role in language processing, language learning, language structure, or language change.
Although many models use ‘units’ in their inputs and outputs, the claim is that this is a simplification that actually limits what the model can explain.
Beyond the Phone and the Phoneme
Some additional problems with the notions of phonetic segment.
Model of gradual language change exhibiting pressure to be regular and to be brief.
Just a Few of the Problems with Segments in Phonology Enumeration of segment types is fraught with problems.
No universal inventory; there are cross-language similarities of segments but every segment is different in every language (Pierrehumbert, 2001).
When we speak the articulation of the same “segment” depends on Phonetic context Word frequency and familiarity Degree of compositionality, which in turn depends on frequency Number of competitors Many other aspects of context…
Presence/absence of aspects of articulation is a matter of degree. Nasal ‘segment’, release burst, duration /degree of approximation to closure
in l’s, d’s and t’s… Language change involves a gradual process of reduction/adjustment.
Segments disappear gradually, not discretely. What is it half way through the change?
The approach misses out on some of the global structure of spoken language that needs to be taken into account in any theory of phonology.
A model of language change that produces irregular past tenses (with Gary Lupyan)
Our initial interest focused on quasi-regular exceptions: Items that add /d/ or /t/ and reduce the vowel:
Did, made, had, said, kept, heard, fled…
Items already ending in /d/ or /t/ that change (usually reduce) the vowel:
hid, slid, sat, read, bled, fought..
We suggest these items reflect historical change sensitive to: Pressure to be brief contingent on comprehension Consistency in mapping between sound and meaning
Two constraints on communication
The spoken form I produce is constrained:
To allow you to understand
To be as short as possible given that it is understood.
My understanding of what you said
My Intended Meaning
Speech
Your understanding of what I said
Your Intended Meaning
Simplified version of this actually explored by Lupyan and McClelland (2003)
The network has a Phonological word pattern Corresponding semantic pattern
For present and past tense forms of 739 verbs It is trained with the phonological word form as
input, and this is used to produce a semantic pattern.
The error at the output layer is back-propagated allowing a change in the connection weights.
The error is also back-propagated to the input units, and is used to adjust the phonological word pattern.
There is also a pressure on the phonological word form representation to be simpler, depending on how well the utterance was understood (summed error at the output units).
The improved phonological word form is then stored in the list.
Your understanding of what I said
What I say whenI want to communicate a particular message.
Model Details: L&M Simulation 2a Semantic patterns
‘Quasi-componential’ representations of tense plus base word meaning are created, based on including tense information in the feature vectors passed through the encoder network.
The representation of past tense varies somewhat from word to word.
Phonological patterns have one unit per phoneme but long vowels or diphthongs have an extra unit, plus a unit for the syllabic ‘ed’. Initialized with binary values (0,1).
Although units still stand for phonemes, presence/absence is a matter of degree.
Learning rate for the representation is slow relative to learning rate for the weights.
739 monosyllabic verbs, frequency weighted. Training corpus is fully regularized at the start of the simulation.
Simulation of Reductive Irregularization Effects
In English, frequent items are less likely to be regular.
Also, d/t items are less likely to be regular.
The same effects emerge in the simulation.
While the past tense is usually one phoneme longer than present, this is less true for the high frequency past tense items.
Reduction of high frequency past tenses is to a phoneme other than the word final /d/ or /t/. Regularity and role in mapping
to meaning protects inflection.
Further Simulations
Simulation 2b showed that when irregulars were present in the training corpus, the network tended to preserve their irregularity.
In ongoing work an extended model shows a tendency to regularize low-frequency exceptions.
Simulation 2c used fully componential semantic representation of past tense, resulting in much less tendency to reduce.
Discussion and Future Directions
The work discussed here is a small example of what needs to be accomplished, even for a model of phonology.
Extending the approach to continuous speech input will be a big challenge
Extending continuous speech to full sentences as input and output will be a bigger challenge still
Neural network approaches are gaining promenance as processing power grows, and these things will be increasingly possible
It will still be useful to notate specific linguistic units, but machines will not need these to communicate – no more that our minds need them to speak and understand.