linguistica. powerpoint? this presentation borrows heavily from slides written by john goldsmith who...
Post on 19-Dec-2015
215 views
TRANSCRIPT
![Page 1: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/1.jpg)
Linguistica
![Page 2: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/2.jpg)
Powerpoint?
This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks, John.
He also says I should enjoy my trip, and one way to do that is to not have to write as many slides while I’m here!
![Page 3: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/3.jpg)
Linguistica
A C++ program that runs under Windows, Mac OS X, and Linux that is available at:
http://humanities.uchicago.edu/ faculty/goldsmith/
There are explanations, papers, and other downloadable tools available there.
![Page 4: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/4.jpg)
References (for the 1st part)
Goldsmith (2001) “Unsupervised Learning of the Morphology of a Natural Language” Computational Linguistics
![Page 5: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/5.jpg)
Overview
Look at Linguistica in action:
English, French Theoretical foundations Underlying heuristics Further work
![Page 6: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/6.jpg)
Linguistica
A program that takes in a text in an “unknown” language…
…and produces a morphological analysis:a list of stems, prefixes, suffixes;more deeply embedded morphological
structure;regular allomorphy
![Page 7: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/7.jpg)
Linguistica
![Page 8: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/8.jpg)
Actions and outlines of information
Here: lists of stems, affixes, signatures, etc.
Here: some messagesfrom the analyst to theuser.
![Page 9: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/9.jpg)
Read a corpus
Brown corpus: 1,200,000 words of typical English
French Encarta or anything else you like, in a text file. Set the number of words you want read,
then select the file.
![Page 10: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/10.jpg)
![Page 11: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/11.jpg)
![Page 12: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/12.jpg)
A stem’s signature is the list of suffixes it appears with in the corpus,in alphabetical order.
abilit ies.y abilities, abilityaboli tion abolitionabsen ce.t absence, absentabsolute NULL.ly absolute, absolutely
List of stems
![Page 13: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/13.jpg)
List of signatures
![Page 14: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/14.jpg)
Signature: NULL.ed.ing.sfor example,account accounted accounting accountsadd added adding adds
![Page 15: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/15.jpg)
Signature <e>ion . NULL
composite concentrate corporate détente discriminate evacuate inflate oppositeparticipate probate prosecute tense
What is this?
composite and composition
composite composit composit + ion
It infers that ion deletes a stem-final ‘e’ before attaching.
We’ll see how we can find a more sophisticated signature…
![Page 16: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/16.jpg)
Top signatures in English
![Page 17: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/17.jpg)
Over-arching theory
The selection of a grammar, given the data, is an optimization problem.
Optimization means finding a maximum or minimum of some objective function
Minimum Description Length provides us with a means for understanding grammar selection as minimizing a function.
(We’ll get to MDL in a moment)
![Page 18: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/18.jpg)
What’s being minimized by writing a good morphology? The number of letters is part of it
Compare:
![Page 19: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/19.jpg)
Naive Minimum Description Length
Corpus:
jump, jumps, jumping
laugh, laughed, laughing
sing, sang, singing
the, dog, dogs
total: 61 letters
Analysis:
Stems: jump laugh sing sang dog (20 letters)
Suffixes: s ing ed (6 letters)
Unanalyzed: the (3 letters)
total: 29 letters.
Notice that the description length goes UP if we analyze sing into s+ing
![Page 20: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/20.jpg)
Minimum Description Length (MDL)
Rissanen (1989) (not a CL paper) The best “theory” of a set of data is the
one which is simultaneously:1. most compact or concise, and2. provides the best modeling of the data
“Most compact” can be measured in bits, using information theory
“Best modeling” can also be measured in bits…
![Page 21: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/21.jpg)
Essence of MDL
0
100000
200000
300000
400000
500000
600000
700000
Best analysis Elegant theorythat works badly
Complex theorymodeled from
data
Length of morphologyLog prob of corpus
![Page 22: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/22.jpg)
Description Length =
Conciseness: Length of the morphology. It’s almost as if you count up the number of symbols in the morphology (in the stems, the affixes, and the rules).
Length of the modeling of the data. We want a measure which gets bigger as the morphology is a worse description of the data.
Add these two lengths together = Description Length
![Page 23: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/23.jpg)
Conciseness of the morphology
Sum all the letters, plus all the structure inherent in the description, using information theory.
![Page 24: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/24.jpg)
Entropy was the weighted (by p(x)) sum of the information content or optimal compressed length (–log2 p(x)) of x. It’s called that because it is always possible to develop a compression scheme by which a symbol x, emitted with probability p(x), is represented by a placeholder of length -log2 p(x) bits.
Remember Entropy?
€
H(X) = − p(x)log2 p(x)x∈X
∑
![Page 25: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/25.jpg)
Optimal Compressed Length
The reason this is mentioned is that we will have lots of pieces of information in our model, and we’d like to figure out how much “space” it takes up.
Remember, we want the smallest model possible, so we are going to want the best compression for anything in our model
Also, remember this:
€
−log p(x) = log1
p(x)
![Page 26: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/26.jpg)
Conciseness of stem list and suffix list
€
(ii) Suffix list λ* | f | + log[WA ]
[ f ]
⎛
⎝ ⎜
⎞
⎠ ⎟
f ∈Suffixes
∑
€
(iii) Stem list : λ* | t | + log([W ]
[t])
⎛
⎝ ⎜
⎞
⎠ ⎟
t∈Stems
∑
Number of letters in stem
cost of setting upthis entity: lengthof pointer in bits
Number of letters in suffix
= number of bits/letter < 5
![Page 27: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/27.jpg)
Signature list length
€
log[W ]
[σ ]σ ∈Signatures
∑ list of pointers to signatures
€
+ log < stems(σ ) > + log < suffixes(σ ) >σ ∈Signatures
∑
€
+ ( log[W ]
[t]t∈Stems(σ )
∑σ ∈Sigs
∑ + log[σ ]
[ f in σ ]f ∈Suffixes(σ )
∑ )
<X> indicates the numberof distinct elements in X
![Page 28: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/28.jpg)
Length of the modeling of the data
Probabilistic morphology: the measure: -1 * log probability ( data )
where the morphology assigns a probability to any data set.
This is known in information theory as the optimal compressed length of the data (given the model).
![Page 29: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/29.jpg)
Probability of a data set?
A grammar can be used not (just) to specify what is grammatical and what is not, but to assign a probability to each string (or structure).
If we have two grammars that assign different probabilities, then the one that assigns a higher probability to the observed data is the better one.
![Page 30: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/30.jpg)
This follows from the basic principle of rationality in the Universe:
Maximize the probability of the observed data.
![Page 31: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/31.jpg)
From all this, it follows:
There is an objective answer to the question: which of two analyses of a given set of data is better?
However, there is no general, practical guarantee of being able to find the best analysis of a given set of data.
Hence, we need to think of (this sort of) linguistics as being divided into two parts:
![Page 32: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/32.jpg)
An evaluator (which computes the Description Length); and
A set of heuristics, which create grammars from data, and which propose modifications of grammars, in the hopes of improving the grammar.
(Remember, these “things” are mathematical things: algorithms.)
![Page 33: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/33.jpg)
Let’s step back for a minute
Why is this problem so hard at first? Because figuring out the best analysis of
any given word generally requires having figured out the rough outlines of the whole overall morphology. (Same is true for other parts of the grammar!).
How do we start?
![Page 34: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/34.jpg)
You all know the answer to this question already…
We start with Zellig Harris’ successor frequency!
Although we got some good answers, we also saw that it made lots of mistakes
So…
![Page 35: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/35.jpg)
As a boot-strapping method to construct a first approximation of the signatures: Harris’ method is pretty good. We accept only stems of 5 letters or more; Only cuts where the SuccFreq is > 1, and
where the neighboring SuccFreq is 1. (This setup was experiment 16 from the
lab on Monday)
![Page 36: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/36.jpg)
Let’s look at how the work is done (in the abstract), step by step...
![Page 37: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/37.jpg)
Corpus
Pick a large corpus from a language --5,000 to 1,000,000 words.
![Page 38: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/38.jpg)
Corpus
Bootstrap heuristicFeed it into the “bootstrapping” heuristic...
![Page 39: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/39.jpg)
Corpus
Out of which comes a preliminary morphology,which need not be superb.Morphology
Bootstrap heuristic
![Page 40: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/40.jpg)
Corpus
Morphology
Bootstrap heuristic
incremental heuristics
Feed it to the incrementalheuristics (…which wehaven’t seen yet)
![Page 41: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/41.jpg)
Corpus
Morphology
Bootstrap heuristic
incremental heuristics
modified morphology
Out comes a modifiedmorphology.
![Page 42: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/42.jpg)
Corpus
Morphology
Bootstrap heuristic
incremental heuristics
modified morphology
Is the modificationan improvement?Ask MDL!
![Page 43: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/43.jpg)
Corpus
Morphology
Bootstrap heuristic
modified morphology
If it is an improvement,replace the morphology...
Garbage
![Page 44: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/44.jpg)
Corpus
Bootstrap heuristic
incremental heuristics
modified morphology
Send it back to theincremental heuristics again...
![Page 45: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/45.jpg)
Morphology
incremental heuristics
modified morphology
Continue until there are no improvementsto try.
![Page 46: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/46.jpg)
The details of learning morphology
There is nothing sacred about the particular choice of heuristic steps
![Page 47: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/47.jpg)
Steps Successor Frequency: strict Extend signatures to cases where a word
is composed of a known stem and a known suffix.
Loose fit: Look at all unanalyzed words. Look to see if they can cut: stem + suffix, where the suffix already exists. Do this in all possible ways. See if any of these lead to stems with signatures that already exist. If so, take the “best” one. If not, compute the utility of the signature using MDL.
![Page 48: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/48.jpg)
Check existing signatures: Using MDL to find best stem/suffix cut. Examples…
![Page 49: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/49.jpg)
Check signatures (English)
on/ve → ion/ive an/en → man/men l/tion → al/ation m/t → alism/alist, etc.
How?
![Page 50: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/50.jpg)
Check signatures
Signature l/tion with stems:federa inaugura orienta substantiaWe need to compute the Description Length
of the analysis as it stands versusas it would be if we shifted varying parts of
the stems to the suffixes.
![Page 51: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/51.jpg)
“Check signatures” French:
NULL nt r >> a ant ar NULL nt >> i int ent t >> oient oit NULL r >> i ir f on ve >> sif sion sive eur ion >> seur sion ce t >> ruce rut se x >> ouse oux l ux >> al aux
me te >> ume ute eurs ion >> teurs tion f ve >> dif dive it nt >> ait ant que sme >> ïque ïsme NULL s ur >> e es eur ient nt >> aient ant f on >> sif sion nt r >> ent er
![Page 52: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/52.jpg)
100,000 tokens, 12,208 types
Zellig redux 1,403 stems
140 signatures
68 suffixes
Extend signatures
226 signatures
Loose fit 2,395 702 signatures
68 suffixes
Check signatures
2,409 730 110
Smooth stems
2,400 735 115
![Page 53: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/53.jpg)
Allomorphy
Find relations among stems: find principles of allomorphy, like
“delete stem-final e before –ing” on the grounds that this simplifies the collection of Signatures:
Compare the signatures NULL.ing, and e.ing.
![Page 54: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/54.jpg)
NULL.ing and e.ing
NULL.ing: its stems do not end in –e -ing (almost) never appears after stem-
final e. (ex. singeing) So e.ing and NULL.ing can both be
subsumed under: <e>ing.NULL, where <e>ing means a
suffix ing which deletes a preceding e.
![Page 55: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/55.jpg)
Find layers of affixation
Find roots (from among the Stem collection)
In other words, recursively look through our list of Stems and see if we could (or should) be analyzing them again:
readings = reading+s = read+ing+s Etc.
![Page 56: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/56.jpg)
What’s the future work?
1. Identifying suffixes through syntactic behavior ( syntax)
2. Better allomorphy ( phonology)
3. Languages with more morphemes/ word (“rich” morphology)
![Page 57: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/57.jpg)
“Using eigenvectors of the bigram graph to infer grammatical features and categories” (Belkin & Goldsmith 2002)
![Page 58: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/58.jpg)
Method
Build a graph in which “similar” words are adjacent;
Compute the normalized laplacian (linear algebra -- it just sound fancy!) of that graph;
Compute the eigenvectors with the lowest non-zero eigenvalues; (more linear algebra)
Plot them.
![Page 59: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/59.jpg)
Map 1,000 English words by left-hand neighbors
non-finite verbs: be, do, go, make,see, get, take, go, say, put, find, give, provide, keep, run…
finite verbs: was, had,has, would, said,could, did, might,went, thought, told, knew, took,asked…
world, way, same, united,right, system, city, case,church, problem, company,past, field, cost, department,university, rate, door,
?: and, to, in that, for, he, as, with,on, by, at, or, from…
![Page 60: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/60.jpg)
Map 1,000 English words by right-hand neighbors
adjectives
social national white local politicalpersonal private strong medical finalblack French technical nuclear british
Prepositions: of in for on by at from into after through under since during against among within along across including near
![Page 61: Linguistica. Powerpoint? This presentation borrows heavily from slides written by John Goldsmith who has graciously given me permission to use them. Thanks,](https://reader036.vdocuments.us/reader036/viewer/2022062516/56649d3e5503460f94a165da/html5/thumbnails/61.jpg)
End