the contribution of abstract ... - university of arizona
TRANSCRIPT
![Page 1: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/1.jpg)
The contribution of abstract linguistically-motivated learning biases to statistical
learning of lexical categories
Erwin Chan
Linguistics colloquium, University of ArizonaNovember 7, 2008
![Page 2: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/2.jpg)
Outline
• Introduction: statistical learning and lexical category induction
• Previous work • Computational modeling of language
acquisition• A new approach• Conclusion
![Page 3: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/3.jpg)
Lexical categories
• Lexical / open-class categories:“Noun”, “Verb”, “Adjective” in English
• Assumed in generative linguistics
• (Some typological problems)
![Page 4: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/4.jpg)
Origin of lexical categories?• Innate
• Reflects cognitive predisposition– Objects = nouns, properties = adjectives, actions = verbs
• Induced from experience– Statistical regularities in input data– General-purpose, domain-independent learning procedures
• Induced but mapped onto innate/cognitive categories
![Page 5: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/5.jpg)
Develop computational model• How do children acquire lexical categories?
– Language acquisition problem
• Abstract computational problem:– Given a corpus of linguistic data, group together words into their
lexical categories
• Develop a computational model– Try to replicate problem in artificial setting– Formulate precise, step-by-step procedures to learn categories
from a corpus
• Solution to computational problem is suggestive about human learning
![Page 6: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/6.jpg)
Human language acquisition
Universal Grammar + sensorimotor systems +
pattern recognition
Linguistic environment
Language production
Colorless green ideas
sleep furiously.
![Page 7: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/7.jpg)
Data structures + algorithms
Corpus Language production
Colorless green ideas
sleep furiously.
Learning from a corpus
![Page 8: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/8.jpg)
Computation = Cognition
Data structures
+ algorithms
( Description of )
Universal Grammar + sensorimotor systems +
pattern recognition
![Page 9: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/9.jpg)
Outline
• Introduction: statistical learning and lexical category induction
• Previous work• Computational modeling of language
acquisition• A new approach• Conclusion
![Page 10: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/10.jpg)
Previous work in syn. category induction
• Distributional learning hypothesis– to ___ Verb– the ___ is Noun
• Previous work– Structuralist linguistics: Harris 1946/1954, Fries 1952– Natural language processing: Schutze 1993/1995, Clark 2000– Psycholinguistics: Redington, Chater, & Finch 1998; Mintz 2002
• General claims– Distributional learning works– Demonstrates utility of statistics– Undermines nativism
![Page 11: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/11.jpg)
How it works
• Demonstrate for English
• Input format: orthographic text corpus– Child-directed speech (psycholinguistics)– Texts written for adults (NLP)– Redington et. al. 1998: very similar results
• Linguistic assumptions– Split between open- and closed-class words
![Page 12: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/12.jpg)
Examine distributional contexts
• Count occurrences of function words to left/right of content words
• Function words = highest frequency words in a language• Content words = lower frequency words
-1 +1
The box is in the pen
![Page 13: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/13.jpg)
Form matrix of co-occurrence counts
• Each row describes a word’s distributional context in the entire corpus
wor
ds to
be
clus
tere
dLeft context features Right context features
box
The is
1 1
![Page 14: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/14.jpg)
Apply clustering algorithm• Group together rows that are similar into clusters
– Some clustering algorithm• Hierarchical clustering• Dimensionality reduction approaches• Probabilistic models• etc.
• Words with similar distributional environments are placed in the same cluster
• If distributional learning hypothesis is correct, the words in a cluster will belong to same syntactic category
![Page 15: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/15.jpg)
Goal for lexical category induction
• One-to-one correspondence between clusters and lexical categories– cluster 1 = all nouns– cluster 2 = all adjectives– cluster 3 = all verbs
![Page 16: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/16.jpg)
Problem: no algorithm discovers the exact set of lexical categories
• Clusters are too fine-grained– Singular nouns– Plural nouns– Verb participles
• Clusters are too coarse-grained– Adjective and Noun conflated– Open- and closed-class categories conflated
![Page 17: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/17.jpg)
Example: hierarchical clustering
• Algorithm– Each item begins in its own cluster– Find two most similar clusters, merge them
into a new cluster– Produces a tree (dendrogram) describing the
merging process• Previous work
– Redington, Chater, & Finch 1998– Mintz, Newport, & Bever 2002
![Page 18: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/18.jpg)
from Redington et. al. 1998
![Page 19: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/19.jpg)
![Page 20: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/20.jpg)
Problem: need a discrete set of clusters
• Dendrogram describes how words are gradually clustered, until there is just one cluster containing all words
• Obtain discrete set of clusters by choosing a cutoff value for cluster similarity
• Resulting clusters are the set of syntactic categories
![Page 21: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/21.jpg)
One category
![Page 22: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/22.jpg)
Two categories
![Page 23: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/23.jpg)
all Verbs
Nouns, Adjectives conflated
![Page 24: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/24.jpg)
Nouns, Adjectives separate
Three different Verb clusters
![Page 25: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/25.jpg)
Mintz et. al. 2002: same issue
![Page 26: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/26.jpg)
Finding separate lexical categories
• Want exactly one cluster for each lexical category: Noun, Adjective, Verb
• No algorithm finds exact set of lexical categories– One Verb cluster Noun, Adj conflated together– Noun, Adjective separate multiple Verb clusters
• Distributional learning doesn’t work...– Despite large corpora– Despite accumulation of lots of statistics
![Page 27: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/27.jpg)
Green = Verbs Blue = Nouns
Red = Adjectives
Nouns and adjectives in English are hard to separate
![Page 28: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/28.jpg)
(Need an algorithm!)
• Mintz 2003: “frequent frames”– the ___ is noun– to ___ the verb
• In English, a frequent frame is a good indicator of words that are homogeneous w.r.t. category– e.g., most words in frame the ___ is are nouns
• However, no algorithm for grouping together all the frequent frames for one category
![Page 29: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/29.jpg)
Outline
• Introduction: statistical learning and lexical category induction
• Previous work• Computational modeling of language
acquisition• A new approach• Conclusion
![Page 30: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/30.jpg)
How to design a computational model of language acquisition?
• Chomsky 2005: 3 factors– 1. Genetic endowment / Universal Grammar (main emphasis)– 2. Linguistic experience– 3. Principles not specific to language, including computation
• Computational interpretation:– 1. Linguistic representation– 2. Input data statistics– 3. Learning algorithm
• Computational modeling addresses all 3 factors– Interaction between factors currently not well understood
![Page 31: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/31.jpg)
Component 1: Data representation
• What structures must be built into the model in order for it to succeed?
• What should the result look like?– Extensional definition
• Verbs = { swim, give, talk, laugh, …, eat } • Words considered to be “Verbs” appear in same cluster
– Intensional definition• Infinite set of Verbs• Define a Verb by its properties: “ to ___ “, “ will ___ “
• What input data representation is necessary to obtain the desired output?
![Page 32: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/32.jpg)
Component 2: statistics of environment
• View #1:– Poverty of Stimulus
– Believed that not possible to account for from linguistic experience,
– Especially given time constraints
![Page 33: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/33.jpg)
Statistics: view #2• Learner encounters lots of data
• Sufficient linguistic experience to learn and generalize
• Demonstrations that child learners are capable of calculating statistics
• Demonstrations of statistical learning from corpora through computational models
![Page 34: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/34.jpg)
Statistics: view #3
• There are statistical properties of naturally occurring linguistic data that are invariant
• Can be investigated by looking at a corpus– Must be naturally occurring corpus– Not artificial
• Consequences for learning…
![Page 35: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/35.jpg)
• # words per inflection, Czech Verbs
• One inflection has very high frequency
• A few inflections have high or moderately high frequency
• Most inflections are infrequent or rare
Zipf’s law: distribution of inflections
![Page 36: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/36.jpg)
Catalan verbs
Rare words
![Page 37: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/37.jpg)
CHILDES Catalan verbs
![Page 38: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/38.jpg)
Zipf’s law: not unique to language usage
• Magnitude of earthquakes• Death toll in wars• Population of cities • Number of adherents to religions• Net worth of individuals • Popularity of websites• Number of species per genus
![Page 39: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/39.jpg)
Consequences of Zipf’s law and sparse data for learning
• Statistical invariants places constraints on cognitively plausible representations and algorithms
• Zipf’s law– Causes sparse data problem / Poverty of the stimulus
• Learners cannot assume abundant statistical evidence
• Must exist alternative means for efficient statistical learning given sparse data
![Page 40: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/40.jpg)
Component 3: algorithms• No Free Lunch theorem
– Not possible to formulate an algorithm that works equally well for all problems
– Any algorithm must be tuned towards specific problem, to some extent
• Any algorithm works best for:– Some particular representation of data– Some particular statistical distribution of data
• Learning bias: set of assumptions made
![Page 41: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/41.jpg)
What kind of learning bias?
• Strong representation, weak statistics– Nativist position
• Weak representation, strong statistics– Statistical / distributional learning
• Intermediate in both– My proposed solution
![Page 42: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/42.jpg)
Summary: must balance 3 factors
• Representation– Want to learn intensional definitions
• Statistics– Naturalistic data is sparse
• Algorithms– Balance learning bias: linguistic and statistical
![Page 43: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/43.jpg)
Outline
• Introduction: statistical learning and lexical category induction
• Previous work • Computational modeling of language
acquisition• A new approach• Conclusion
![Page 44: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/44.jpg)
Key problems to address
• Fine-grained clusters
• Categories conflated into same cluster
• Previous work– Cluster all frequent content words– Look at immediately adjacent words only– Does not consider data statistics
![Page 45: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/45.jpg)
Key aspects of solution:stronger linguistic bias
• Use abstract distributional contexts– Instead of just immediate words
• Cluster morphological base forms– Instead of clustering words of all inflections
• Concept of open-class clusters– Instead of arbitrary merging of clusters
![Page 46: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/46.jpg)
Input data
• Brown corpus + Penn Treebank– Text corpus– 2 million words– Written for adults
• (Redington et. al. 1998: no differences in cluster analysis between child-directed speech and adult-directed text)
![Page 47: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/47.jpg)
Solution part 1: Too many categories
• Problem: fine-grained clusters often correspond to different morphological inflections of same lexical category
• Solution: cluster one inflection per category– instead of words of all inflections – One inflection per underlying lexical category
one cluster per underlying lexical category
• Lexicon + rules model of morphophonology– Cluster “morphological base form”
![Page 48: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/48.jpg)
Rule-based model of morphology and lexical categories
Lexicon
base1: Abase2: Abase3: Bbase4: Bbase5: A…
base
A B
t1 t2
t3
t5 t4
base
t6
t7
t8
POS categories
![Page 49: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/49.jpg)
Morphological base form• English “base” or “bare form”
• Dog• Bake• Sell
• Not: dogs, dog’sbaking, bakedsells, sold
![Page 50: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/50.jpg)
Freq. distribution of inflections
• Zipfian dist. of inflections– # word types per inflection
• Most type-frequentinflections in corpora– Nouns, Adjectives:
nominative singular– Verbs: Infinitive or
3rd person present singular
– “canonical base”
LexemesInflections
![Page 51: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/51.jpg)
Learning rules from sparse data
• Sketch of morphology acquisition algorithm:– For a particular lexical
category:– 1. Identify base through
high frequency– 2. Relate other, less-
frequent inflections to base through morphophonologicalrules
LexemesInflections
Base
![Page 52: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/52.jpg)
Discover rule-based grammar
Base forms
Base forms
![Page 53: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/53.jpg)
English top 1000 wordsGreen = Verbs Blue = Nouns
Red = Adjectives
![Page 54: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/54.jpg)
English top 1000 base formsGreen = Verbs Blue = Nouns
Red = Adjectives
![Page 55: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/55.jpg)
Solution part 2: conflated categories
• Previous work:– Large clusters get merged together
(nouns, adjs)
• Solution:– Identify open-class clusters– Let each open-class cluster gain more
members– But don’t merge open-class clusters
![Page 56: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/56.jpg)
Clustering algorithm
• Let each data point be in a cluster.
• Go through all pairs of data points (p1, p2) in order of decreasing similarity.– If p1’s cluster and p2’s cluster are both above
a threshold size, do not merge their clusters– Otherwise merge their clusters
![Page 57: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/57.jpg)
English top 1000 base forms
![Page 58: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/58.jpg)
English top 1000 base forms
![Page 59: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/59.jpg)
English top 1000 base forms
![Page 60: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/60.jpg)
English top 1000 base forms
![Page 61: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/61.jpg)
Intentions of algorithm
• Algorithm discovers cluster centers first
• Clusters continually add new members
• Large cluster open-class category
• Large clusters never get merged– Noun and adj should not be conflated
![Page 62: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/62.jpg)
Intermediate results
Adj Noun Verb Other
cluster 1 143 302 5 13
cluster 2 30 365 4 5
cluster 3 3 18 112 0
Doesn’t work: Adjectives and Nouns still in same cluster (cluster 1)
![Page 63: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/63.jpg)
Solution part 3: higher-order features
• Definition of adjective?– the ___ (ambiguous between adj/noun)– “the big dog”– “the dog”
• Better definition: ___ Noun– Not available when using words as
distributional context
![Page 64: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/64.jpg)
Clusters as distributional context
• Annotate the data with the clusters that were learned– The big/1 dog/1 wants to sleep/3.
• Update co-occurrence matrix• Cluster againw
ords
to b
e cl
uste
red
Left features:
words
Left / Right cluster
features
Right features:
words
![Page 65: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/65.jpg)
Cluster 1 “errors”: american, indian, soviet, african
Pass 1 Adj Noun Verb Other
cluster 1 143 302 5 13
cluster 2 30 365 4 5
cluster 3 3 18 112 0
Pass 2 Adj Noun Verb Other
cluster 1 144 75 4 8
cluster 2 29 604 6 7
cluster 3 3 6 111 3
![Page 66: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/66.jpg)
Algorithm summary
• Linguistically motivated algorithm:
– Cluster base forms only, instead of all inflections
– Concept of open-class categories• Do not merge them into a larger category
– Abstract category formation• Available for use in subsequent stages of learning
![Page 67: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/67.jpg)
Solves problems of previous work
• 1. Too many categories– Cluster morphological base forms only
• 2. Lexical categories merged together – Don’t merge open-class clusters– Use categories as distributional contextual
features
![Page 68: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/68.jpg)
Current and future work• Other languages
– Also examined Spanish: complications of gender classes
• Develop a tagger to extend analysis to rare words
• Syncretism
• Derivational morphology
• Lack of function words
• Other kinds of syntactic relations
![Page 69: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/69.jpg)
Outline
• Introduction: statistical learning and lexical category induction
• Previous work• Computational modeling of language
acquisition• A new approach• Conclusion
![Page 70: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/70.jpg)
Summary• Problem: acquisition of open-class lexical
categories
• Treat as abstract computational problem– Develop a model that learns from a corpus
• Balance learning bias– Linguistic: what representational structures are built in
to the learner?– Statistical: what assumptions about the data
distribution does the learner make?
![Page 71: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/71.jpg)
Comparison to previous work
• Previous computational models:– Doesn’t attain desired set of lexical categories– Distributional learning according to words in
immediate context is insufficient
• Current model:– Stronger linguistic bias– More successfully acquires lexical categories– Strong linguistic bias may be necessary
![Page 72: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/72.jpg)
Development of proposed solution
• Methodology– What is the desired result?– Look at statistical distribution of input data– What is necessary in the learner in order to produce desired
result?
• Build linguistic concepts into architecture of learner– Open-class categories– Morphological base– Formation of abstract categories
• Still a data-driven, statistical / distributional learner
![Page 73: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/73.jpg)
Linguistic and cognitive implications
• A solution to computational problem says something about the language faculty
• Supports:– Rule-based model of morphology– Formation and further availability of abstract categories
• Against:– Highly lexicalist models– Item-based learning (no abstractions)
![Page 74: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/74.jpg)
Thank you.
![Page 75: The contribution of abstract ... - University of Arizona](https://reader033.vdocuments.us/reader033/viewer/2022042815/6268f4e49b77527c964df135/html5/thumbnails/75.jpg)
Bibliography• Erwin Chan. 2008. Structures and Distributions in Morphology Learning.
Ph.D. thesis, University of Pennsylvania.• Alexander Clark. 2000. Inducing syntactic categories by context distribution
clustering. Proceedings of the Computational Natural Language Learning and the Second Learning Language in Logic Workshop (CoNLL-LLL).
• Charles C. Fries. 1952. The Structure of English. London: Longmans.• Zellig Harris. 1946. From morpheme to utterance. Language, 22(3), 161-
183.• Zellig Harris. 1954. Distributional structure. Word, 10(2/3), 146-162.• Toben H. Mintz. 2003. Frequent frames as a cue for grammatical categories
in child directed speech. Cognition, 90, 91-117.• Toben H. Mintz, Elissa L. Newport, and Thomas Bever. 2002. The
distributional structure of grammatical categories in speech to young children. Cognitive Science, 26, 393-425.
• Martin Redington, Nick Chater, and Steven Finch. 1998. Distributional information: a powerful cue for acquiring syntactic categories. Cognitive Science, 22(4), 425-469.
• Hinrich Schütze. 1993. Part-of-speech induction from scratch. Proceedings of the Association for Computational Linguistics (ACL).