![Page 1: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/1.jpg)
Simulation of Language Acquisition
Walter Daelemans
(CNTS, University of Antwerp)[email protected]://www.cnts.ua.ac.be/~walter
EMLAR 2005 Utrecht
![Page 2: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/2.jpg)
Overview• Theories, computational models and
simulations• Machine Learning
– Generalization versus abstraction– Eager versus lazy learning
• Memory-based models of language acquisition and processing
• Case Study 1: Stress acquisition• TiMBL crash course and demonstration• Case Study 2: German plural
![Page 3: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/3.jpg)
Simulation (1)
• Theory– Explains and predicts empirical data
(observations, experimental results)– Cogsci: in terms of knowledge representation,
acquisition, and processing framework– Problems
• Verbal• Sometimes vague, underspecified• Every theoretical description, however exact, turns
out to contain errors when you try to implement it (~ Hugo Brandt Corstius, second law of Computational Linguistics)
![Page 4: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/4.jpg)
Simulation (2)• Computational Model
– Translation of a theory into specific symbol representation and processing framework (algorithms and data structures)
– Advantages• Precise formulation• Explicit in all details• Consistence and completeness can sometimes be
proven• Falsifiable through simulations
• Simulations– A computational model with specific
parameter settings used to mimic specific empirical data
![Page 5: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/5.jpg)
Machine Learningas a model for acquisition
• Cognitive architecture– Competence (knowledge representation)– Performance (search)– Acquisition (search)
• Bias – Restrictions on input and output
representations– Restrictions on learning algorithm– Restrictions on knowledge representation
formalism
![Page 6: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/6.jpg)
Output Input
Performance Component
Ri RkRl
Rj
Learning Component
Search
Experience
BIAS
![Page 7: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/7.jpg)
Generalisation Abstraction
+ abstraction
- abstraction
+ generalisation - generalisation
Rule InductionConnectionism
StatisticsHandcrafting
Table LookupMemory-Based Learning
…(Fill in your most hated
linguist here)
![Page 8: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/8.jpg)
Nativism Rule-Based
nativist
empiricist
+ rule-based - rule-based
Innate mental rules
Rule Induction ConnectionismStatistics
Memory-Based Learning
Hard-wired neural networksInnate probabilities?
Innate exemplars?
![Page 9: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/9.jpg)
Machine Learning crash course
The field of machine learning is concerned with the question of how to construct computer programs that automatically learn with experience. (Mitchell, 1997)
• Dynamic process: learner L shows improvement on task T after learning.
• Getting rid of programming.• Handcrafting versus learning.• Machine Learning is task-independent.
![Page 10: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/10.jpg)
Machine Learning: Roots
• Information theory• Artificial intelligence • Pattern recognition • Took off during 70s • Major algorithmic improvements during
80s • Forking: neural networks, data mining
![Page 11: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/11.jpg)
Machine Learning: 2 types
• Theoretical ML (what can be proven to be learnable by what?) – Gold, identification in the limit – Valiant, probably approximately correct learning
• Empirical ML (on real or artificial data) – Evaluation Criteria:
• Accuracy• Quality of solutions • Time complexity• Space complexity• Noise resistance
![Page 12: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/12.jpg)
Empirical ML: Key Terms 1• Instances: individual examples of input-output
mappings of a particular type• Input consists of features• Features have values• Values can be
– Symbolic (e.g. letters, words, …)– Binary (e.g. indicators)– Numeric (e.g. counts, signal
measurements)
• Output can be– Symbolic (classification: linguistic symbols, …)– Binary (discrimination, detection, …)– Numeric (regression)
![Page 13: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/13.jpg)
Empirical ML: Key Terms 2
• A set of instances is an instance base• Instance bases come as labeled training sets or unlabeled
test sets (you know the labeling, the learner does not)
• A ML experiment consists of training on the training set, followed by testing on the disjoint test set
• Generalization performance (accuracy, precision, recall, F-score) is measured on the output predicted on the test set
• Splits in train and test sets should be systematic: n-fold cross-validation– 10-fold CV– Leave-one-out testing
• Significance tests on pairs or sets of (average) CV outcomes
![Page 14: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/14.jpg)
Empirical ML: 2 Flavors
• Eager– Learning
• abstract model from data
– Classification• apply abstracted model to new data
• Lazy– Learning
• store data in memory
– Classification• compare new data to data in memory
![Page 15: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/15.jpg)
Eager vs Lazy Learning
Eager:– Decision tree induction
• CART, C4.5
– Rule induction• CN2, Ripper
– Hyperplane discriminators
• Winnow, perceptron, backprop, SVM
– Probabilistic• Naïve Bayes,
maximum entropy, HMM
– (Hand-made rulesets)
Lazy:– k-Nearest
Neighbour• MBL, AM• Local regression
![Page 16: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/16.jpg)
-etje
-kje
Coda last syl
Nucleus last syl
Rule Induction
![Page 17: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/17.jpg)
?
-etje
-kje
Coda last syl
Nucleus last syl
MBL
![Page 18: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/18.jpg)
Eager vs Lazy Learning
• Decision trees keep the smallest amount of informative decision boundaries (in the spirit of MDL, Rissanen, 1983)
• Rule induction keeps smallest number of rules with highest coverage and accuracy (MDL)
• Hyperplane discriminators keep just one hyperplane (or vectors that support it)
• Probabilistic classifiers convert data to probability matrices
• k-NN retains every piece of information available at training time
![Page 19: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/19.jpg)
Eager vs Lazy Learning
• Minimal Description Length principle:– Ockham’s razor– Length of abstracted model (covering core)– Length of productive exceptions not covered by core
(periphery)– Sum of sizes of both should be minimal– More minimal models are better
• “Learning = compression” dogma• In ML, length of abstracted model has been
focus; not storing periphery
![Page 20: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/20.jpg)
Eager vs Lazy: So?• Highly relevant to language modeling• In language data, what is core? What is periphery?• Often little or no noise; productive exceptions• (Sub-)subregularities, pockets of exceptions• “disjunctiveness” and “polymorphism”• Some important elements of language have different
distributions than the “normal” one• E.g. word forms have a Zipfian distribution• Hard to distinguish noise from exceptions on the
basis of– Frequency– Typicality
![Page 21: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/21.jpg)
![Page 22: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/22.jpg)
ML and Natural Language
• Apparent conclusion: ML could be an interesting tool to do psycholinguistic modeling– Next to probability theory, information theory,
statistical analysis (natural allies)
• More and more annotated data available• Skyrocketing computing power and
memory
![Page 23: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/23.jpg)
Case Study
Exemplar-based acquisition of Dutch Stress
(Durieux / Gillis / Daelemans)
![Page 24: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/24.jpg)
This “rule of nearest neighbor” has considerable elementary intuitive appeal and probably corresponds to practice in many situations. For example, it is possible that much medical diagnosis is influenced by the doctor's recollection of the subsequent history of an earlier patient whose symptoms resemble in some way those of the current patient. (Fix and Hodges, 1952, p.43)
MBL: Use memory traces of experiences as a basis for analogical reasoning, rather than using rules or other abstractions extracted from experience and replacing the experiences.
![Page 25: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/25.jpg)
MBL Acquisition
• Language process is represented by a set of exemplars in memory– Exemplars act as models– Learning is incremental storage of exemplars– Compression and Metrics
• Exemplar consists of set of (mostly symbolic) features
![Page 26: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/26.jpg)
MBL Processing
• New instances of a performance process are solved through – Memory retrieval– Analogical (Similarity-Based) Reasoning
• Similarity metric– Language (faculty) - independent– Adaptive (feature and exemplar weighting)
![Page 27: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/27.jpg)
Operationalization• Basis: k nearest neighbor algorithm:
– store all examples in memory– to classify a new instance X, look up the k
examples in memory with the smallest distance D(X,Y) to X
– let each nearest neighbor vote with its class– classify instance X with the class that has the most
votes in the nearest neighbor set• Choices:
– similarity metric– number of nearest neighbors (k)– voting weights
![Page 28: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/28.jpg)
The Overlap distance function
• “Count the number of mismatching features”
![Page 29: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/29.jpg)
The MVDM distance function
• Estimate a numeric “distance” between pairs of values– “e” is more like “i” than like “p” in a phonetic
task– “book” is more like “document” than like “the”
in a parsing task
![Page 30: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/30.jpg)
Feature weighting in the distance function
• Mismatching on a more important feature gives a larger distance
• Factor in the distance function:
![Page 31: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/31.jpg)
Entropy & IG: Formulas
![Page 32: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/32.jpg)
Exemplar weighting
• Scale the distance of a memory instance by some externally computed factor
• Smaller distance for “good” instances• Bigger distance for “bad” instances
![Page 33: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/33.jpg)
Distance weighting
• Relation between larger k and smoothing
• Make more distant neighbors contribute less in the class vote– Linear inverse of distance (w.r.t. max)– Inverse of distance– Exponential decay
![Page 34: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/34.jpg)
Learning word stress:A case study
• Learn primary stress • Compare MBL with P&P/UG• Match acquisition and processing data
• Durieux, G. (2003) “Computermodellen en klemtoon.” Fonologische Kruispunten, BICN.
• Daelemans, W., Gillis, S., and Durieux, G. (1994). The acquisition of stress: A data-oriented approach." Computational Linguistics 20: 421-451.
• Daelemans, W., Gillis, S., Durieux, G., and Van den Bosch, A. (1993). Learnability and markedness: Dutch stress assignment. In T.M. Ellison and J.M. Scobbie (Eds.), Computational Phonology . Edinburgh Working Papers in Cognitive Science, 8, pp. 157-178.
![Page 35: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/35.jpg)
MBL for psychology
• Similarity metric– Analogy engine
• Feature weighting– Relevance assignment– Information fusion
• Value weighting– Implicit concept formation
• Exemplar weighting– Recency, priming
• Distance-weighted extrapolation– Distributions, probabilities
• Local modeling– Heterogeneity and density
![Page 36: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/36.jpg)
Dominant Linguistic Approach
• Principles and Parameters, UG– Typology– Acquisition
• Formalism: Metrical trees, metrical grids• Stress = prominence relations between
constituents in a hierarchical structure
![Page 37: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/37.jpg)
YOUPIE (Dresher & Kaye, 1990)
• Assumptions– 11 parameters (216 “languages”)– Task-specific system for learning stress (domain
knowledge)– Core grammar only
• Learning– Cue-based parameter setting results in a grammar of
stress
• Performance– Generate tree with grammar and algorithmically
determine stress location
![Page 38: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/38.jpg)
1 0 1 0 0 0 0 1 1 0 1UG-stressGrammar and Assignment rules
word
PLD
Cue-based Learning
![Page 39: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/39.jpg)
Parameters (with setting for Dutch)
Parameter Value
P1 Word tree right/left dominant
P2 Binary/unbound feet
P3 Feet assigned from the left/right edge
P4 Feet right/left-dominant
P5 Feet are / are not quantity-sensitive
P6 Feet are quantity-sensitive w.r.t. rime / nucleus
P7 Strong node in foot must / mustn’t branch
P8A There isn’t / is an extra-metrical syllable
P8 Left / Right-most syllable is extra-metrical
P9 Weak foot looses / doesn’t loose foot status in a clash
P10 Feet are / aren’t assigned iteratively
![Page 40: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/40.jpg)
MBL
• Assumptions– Lexical storage and generalization – Generic learning method, no task-specific linguistic
knowledge– Core and periphery
• Learning– Based on storage of exemplars
• Performance– Similarity-based reasoning with feature weighting on
stored exemplars
![Page 41: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/41.jpg)
Syllable-structure representationsRetrieval orSimilarity-based reasoning on exemplars
word
PLD
Storage
Stress pattern
![Page 42: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/42.jpg)
YOUPIE tested• Experimental design
– 216 languages– 117 items per language generated by YOUPIE performance
component (no exceptions, core only)– For each language, grammar learned with YOUPIE cue-
based learning component
• Results– For 60% of the languages, YOUPIE reconstructs the original
parameter setting with which the words were generated– For 21% convergence is to a compatible setting– For 19% of the languages errors in one or more stress
patterns
• Upper Boundary!– Perfect input, no exceptions to be learned
![Page 43: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/43.jpg)
MBLP vs.
Youpie
System and level
Score
Sd Accuracy
MBLP-words 104 15.01
89%
YOUPIE-words
105 28.24
90%
MBLP-syllables
3.7 97%
YOUPIE-syllables
11.88
95%
MBLP-languages
89 41%
YOUPIE-languages
176 81%
![Page 44: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/44.jpg)
Discussion
• No significant quantitative difference in performance
• Clear qualitative difference– YOUPIE: more languages perfectly learned– MBLP: fewer errors per language
• Issues:– Real language data– Core and periphery– Acquisition– Processing
![Page 45: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/45.jpg)
Dutch stress
• Stress on one of the last three syllables • Predictable, but not completely
– E.g., py-a-ma ca-na-da pa-ra-plu
• Words not covered by the parameter-configuration for Dutch need lexical marking with exception features (one, two or completely idiosyncratic)
![Page 46: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/46.jpg)
MBLP on Dutch data
• CELEX, 4868 monomorphemes• Exemplar encoding schemes For each of the three final syllables:
– S1: syllable weight (SL, L, H, SH)– S2: nucleus and coda (complete rhymes, VC)– S3: nucleus and coda (separate features, phonemes)– S4: onset, nucleus, and coda (phonemes)
• Class: final, penultimate, ante-penultimate
![Page 47: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/47.jpg)
Results
![Page 48: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/48.jpg)
Language Acquisition• Learning rules or learning lexical items?• Rules (Hochberg ‘88 Spanish, Nouveau ‘93 Dutch)
– Lexical learning lacks generalization capacity– Lexical learning incompatible with acquisition data
• Imitation task– Errors increase with irregularity– Tendency to regularization (but irregularization
occurs)• By stress shift• By changing structure of repeated word
![Page 49: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/49.jpg)
Error Percentages
![Page 50: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/50.jpg)
Discussion• MBLP error correlates with markedness like
children’s errors• MBLP has a tendency for regularization like children
– Direction of stress shifts – Structural changes from inspection of nearest neighbors
• Irregularization and differences 3 and 4 year-olds on marked patterns hard to explain in rule-based context
Rule learning is not the only possible explanation for the language acquisition data
![Page 51: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/51.jpg)
Adult processing
• Rule-based: stress grammar and set of irregular words, marked in the lexicon– Known words: rule application except when
blocked by lexicon– Unknown words: rule application
• MBLP: lexical storage and analogy– Known words: look-up– Unknown words: analogy
![Page 52: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/52.jpg)
Experimental set-up
• Stimuli– Create pseudo-words and transcribe them
(encoding 4)– Have a machine learner assign stress
(regular or irregular)
Bisyllabic Trisyllabic
Regulars 60 60
Irregulars 60 60
![Page 53: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/53.jpg)
Experimental set-up
• Method– 18 adult participants– Reading task– 3 independent judges, consensus
• Result– Main effect for regularity-variable (ANOVA p
< .001); regular stress only in regular conditions– In all conditions, participants do the same as
model prediction (ANOVA p < .001)
![Page 54: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/54.jpg)
Results
![Page 55: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/55.jpg)
Results
![Page 56: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/56.jpg)
Discussion
• Adult speakers sometimes prefer marked stress patterns for non-words
• These cases are partially predictable with an MBLP model and are problematic in a rule-based model (regularization only)
• BUT:– MBLP has a significantly better match with participant
behavior in the regular conditions– Hypothesis: differences between mental lexicon and
celex• Using a set-up with a population of machine ‘learners’
using different samples from celex explains the variability
![Page 57: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/57.jpg)
Summary• Goal: put MBLP to the test on a concrete linguistic
problem of sufficient complexity by comparing it to– Linguistic theory– Child language acquisition data– Adult processing data
• Results:– MBLP and YOUPIE (P&P/UG) comparable– MBLP can learn core as well as periphery using superficial
representations– MBLP shows same errors and tendencies as children
learning stress placement– MBLP better predictor of human adult behaviour with non-
words
![Page 58: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/58.jpg)
Overall Conclusion
• Exemplar-based models should be taken as a serious alternative for rule-based/P&P/UG/dual route type theories– Workable operationalisation of analogy– Adequacy
• Similar results in morphology and syntax (grammatical relations, chunking, pp-attachment)
• We’ll see …
![Page 59: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/59.jpg)
Simulation with TiMBL
Demonstration: German plural
![Page 60: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/60.jpg)
TiMBLhttp://ilk.uvt.nl/timbl
• Tilburg Memory-Based Learner• Available for research and education• Lazy learning, extending k-NN and IB1• Optimized search for NN
– Internal structure: tree, not flat instance base– Tree ordered by chosen feature weight– Many built-in optional metrics: feature weights,
distance function, distance weights, exemplar weights, …
![Page 61: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/61.jpg)
Current practice
• Default TiMBL settings: – k=1, Overlap, GR, no distance weighting– Work well for some morpho-phonological tasks
• Rules of thumb:– Combine MVDM with bigger k– Combine distance weighting with bigger k– Very good bet: higher k, MVDM, GR, distance
weighting– Especially for sentence and text level tasks
![Page 62: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/62.jpg)
usage: Timbl -f data-file {-t test-file} [options]
Algorithm and Metric options:
-a n : algorithm.
0 or IB1 : IB1 (default)
1 or IG : IGTree
2 or TRIBL : TRIBL
3 or IB2 : IB2
4 or TRIBL2 : TRIBL2
-m s : use feature metrics as specified in string s:
format: GlobalMetric:MetricRange:MetricRange
e.g.: -mO:N3:I2,5-7
D: Dot product. (Global only. numeric features implied)
O: weighted Overlap. (default)
M: Modified value difference.
N: numeric values.
I: Ignore named values.
![Page 63: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/63.jpg)
-w 0 : No Weighting.
1 : Weight using GainRatio. (default)
2 : Weight using InfoGain
3 : Weight using Chi-square
4 : Weight using Shared Variance
f : use Weights from file 'f'.
-b n : number of lines used for bootstrapping (IB2 only).
-d val : weight neighbors as function of their distance:
Z : all the same weight. (default)
ID : Inverse Distance.
IL : Inverse Linear.
ED:a : Exponential Decay with factor a. (no whitespace!)
ED:a:b : Exponential Decay with factor a and b. (no whitespace!)
-k n : k nearest neighbors (default n = 1).
![Page 64: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/64.jpg)
-q n : TRIBL treshold at level n.-L n : MVDM treshold at level n.-R n : solve ties at random with seed n.-t f : test using file 'f'.-t leave_one_out: test with Leave One Out,using IB1.-t cross_validate: Cross Validate Test,using IB1. @f : test using files and options described in file 'f'. Supported options: d e F k m o p q R t u v w x % - -t <file> is mandatory
![Page 65: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/65.jpg)
Input options:-f f : read from Datafile 'f'.-f f : OR: use filenames from 'f' for CV test-F format: Assume the specified inputformat. (Compact, C4.5, ARFF, Columns, Binary, Sparse
)-l n : length of Features (Compact format only).-i f : read the InstanceBase from file 'f'. (skips phase 1
& 2 )-u f : read value_class probabilities from file 'f'.-P d : read data using path 'd'.-s : use exemplar weights from the input file-s0 : silently ignore the exemplar weights from the
input file
![Page 66: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/66.jpg)
Output options:-e n : estimate time until n patterns tested.-I f : dump the InstanceBase in file 'f'.-n f : create names file 'f'.-p n : show progress every n lines. (default p = 100,000)-U f : save value_class probabilities in file 'f'.-V : Show VERSION.+v or -v level : set or unset verbosity level, where level is s: work silently. o: show all options set. f: show Calculated Feature Weights. (default) p: show MVD matrices. e: show exact matches. as: show advanced statistics. (memory consuming) cm: show Confusion Matrix. cs: show per Class Statistics. (implies +vas) di: add distance to output file. db: add distribution of best matched to output file k: add a summary for all k neigbors to output file (sets -x) n: add nearest neigbors to output file (sets -x and --) You may combine levels using '+' e.g. +v p+db or -v o+di
![Page 67: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/67.jpg)
-W f : save current Weights in file 'f'.+% or -% : do or don't save test result (%) to file.-o s : use s as output filename.-O d : save output using path 'd'.
Internal representation options:-B n : number of bins used for discretization of numeric feature values-c n : clipping frequency for prestoring MVDM matrices-D : Don't store distributions. (saves memory, but disables +vDB option)+H or -H: write hashed trees (default +H)-M n: size of MaxBests Array-N n: Number of features (default 2500)-T n : ordering of the Tree : DO: none. GRO: using GainRatio IGO: using InformationGain (… and many others)+x or -x : Do or don't use the exact match shortcut. (IB only, default is -x)
![Page 68: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/68.jpg)
Data & Representation• Symbolic features
– segmental information (syllable structure)– stress– gender
• German Plural (~ 25,000 from CELEX)Vorlesung (lecture) l e - z U N F en
Classes: e (e)n s er - U- Uer Ue
![Page 69: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/69.jpg)
Cognitive Architectures of Inflectional Morphology
• Dual Route (Pinker, Clahsen, Marcus …)
– Rules for regular cases• (over)generalization• default behaviour
– Associative memory for exceptions• irregularization / family effects
• Single Route (R&M, MacWhinney, Plunkett, Elman, …)
– Frequency-based regularity
Dual Route
PatternAssociator Rule
Input Features
Suffix-class
MemoryFailure
![Page 70: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/70.jpg)
German Plural
• Notoriously complex but routinely acquired (at age 5)
• Evidence for Dual Route ? -s suffix is default/regular (novel words,
surnames, acronyms, …)
-s suffix is infrequent (least frequent of the five most important suffixes)
![Page 71: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/71.jpg)
Class Frequency Umlaut Frequency Example(e)n 11920 Abarte 6656 no 4646 Abbau
yes 2010 Abdampf - 4651 no 4402 Aasgeier
yes 249 Abwasserer 974 no 287 Abbild
yes 687 Abgangs 967 Abonnement
![Page 72: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/72.jpg)
The default status of -s
• Similar item missing Fnöhk-s• Surname, product name Mann-s• Borrowings Kiosk-s• Acronyms BMW-s• Lexicalized phrases Vergissmeinnicht-s• Onomatopoeia, truncated roots, derived nouns, ...
![Page 73: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/73.jpg)
![Page 74: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/74.jpg)
Discussion• Three “classes” of plurals: ((-en -)(-e -er))(s)
the former 4 suffixes seem “regular”, can be accurately learned using information from phonology and gender
-s is learned reasonably well but information is lacking• Hypothesis: more “features” are needed (syntactic, semantic,
meta-linguistic, …) to enrich the “lexical similarity space”
• No difference in accuracy and speed of learning with and without Umlaut
• Overall generalization accuracy very high: 95% (90%)
• Schema-based learning (Köpcke).
*,*,*,*,i,r,M e
![Page 75: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/75.jpg)
![Page 76: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/76.jpg)
![Page 77: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/77.jpg)
Acquisition Data:Summary of previous studies
• Existing nouns: (Park 78; Veit 86; Mills 86; Schamer-Wolles 88; Clahsen et al. 93; Sedlak et al. 98)
– Children mainly overapply -e or -(e)n– -s plurals are learned late
• Novel words: (Mugdan 77; MacWhinney 78; Phillis & Bouma 80; Schöler & Kany 89)
– Children inflect novel words with -e or -(e)n – More “irregular” plural forms produced than
“defaults”
![Page 78: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/78.jpg)
MBLP simulation
• model overapplies mainly -en and -e
• -s is learned late and imperfectly
• Mainly but not completely parallel to input frequency (more -s overgeneralization than -er generalization)
![Page 79: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/79.jpg)
Bartke, Marcus, Clahsen (1995)
• 37 children age 3.6 to 6.6• pictures of imaginary things,
presented as neologisms– names or roots
– rhymes of existing words or not
– choice -en or -s
• results:– children are aware that unusual
sounding words require the default
– children are aware that names require the default
![Page 80: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/80.jpg)
MBLP simulation
• sort CELEX data according to rhyme
• compare overgeneralization– to -en versus to -s
– percentage of total number of errors
• results:– when new words don’t rhyme
more errors are made
– overgeneralization to -en drops below the level of overgeneralization to -s
![Page 81: Simulation of Language Acquisition Walter Daelemans](https://reader033.vdocuments.us/reader033/viewer/2022052412/558cf246d8b42a7c708b45f1/html5/thumbnails/81.jpg)
Conclusions
• Computational models in language acquisition shouldn’t necessarily be connectionist– From rule induction to exemplar-based
models
• TiMBL may be useful as software for computational psycholinguistics