from pan¯ .inian sandhi to finite state...
TRANSCRIPT
![Page 1: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/1.jpg)
From Pan. inian Sandhito Finite State Calculus
Malcolm D. Hyman
Max Planck Institute for the History of Science, Berlin
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.1
![Page 2: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/2.jpg)
Overview
1. Research context
2. An XML vocabulary for Pan. inian rules
3. From Pan. inian rules to an FST
4. Implications: remarks on linguistic description
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.2
![Page 3: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/3.jpg)
Research context
Ongoing work on modeling components ofSanskrit grammar according to Pan. inianprinciples
nominal inflection
verbal inflection (using Dhatupat.ha)stem formation (perfect stem, participialstems. . . )
morphophonology (sandhi)
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.3
![Page 4: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/4.jpg)
Methodology
How closely to follow Pan. ini?
Practical concerns dictate an incrementalapproach.
We are obliged to interpret Pan. ini.
Research results concerning both Indiangrammatical methods and facts of theSanskrit language will emerge fromcomputational studies.
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.4
![Page 5: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/5.jpg)
Building blocks of an XML model
The rules model not only a Pan. inian sutra, butalso its context and its interpretation.
An XML schema
A sound-based encoding (SLP1)
A regular expression dialect (PCREs)
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.5
![Page 6: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/6.jpg)
The SLP1 encoding
�
a
a
���
a
A
�
i
i
��
ı
I
�
u
u
�
u
U
�
r�
f
r�
F
l�
x
�
l�
X
��
e
e
� �
ai
E
�� �
o
o
�� ��
au
O
*
��� k
k
���
kh
K
���
g
g
����
gh
G
��� n
N
� ����
c
c
�� ch
C
����
j
j���
jh
J
���
ñ
Y
�� t.
w
�� t.h
W
���d.
q
�� d. h
Q
!��
n.R
"��
t
t
#��
th
T
$� d
d
�%��
dh
D
&��
n
n
'��
p
p
(� ph
P
)��
b
b
*��
bh
B
+��
m
m
,��
y
y
�-�r
r
.� l
l
/��
v
v0��
s
S
1��
s.z
�2��
s
s
3�h
h
* anusvara = M; visarga = H
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.6
![Page 7: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/7.jpg)
The rule element
8.3.23 mo ’nusvarah.
<rule source="m"target="M"rcontext="[@(wb)][@(hal)]"ref="A.8.3.23"/>
(We may need more than one rule to express a
sutra.)
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.7
![Page 8: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/8.jpg)
The macro element
We need some means for translating Pan. ini’smetalanguage, e. g. sound classes (pratyaharas):
<macro name="JaS"value="JBGQDjbgqd"c="voiced stop"/>
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.8
![Page 9: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/9.jpg)
The mapping element
1.1.2 aden gun. ah.
<mapping name="guna"ref="A.1.1.2">
<map from="@(a)" to="a"/><map from="@(i)" to="e"/><map from="@(u)" to="o"/><map from="@(f)" to="a"/><map from="@(x)" to="a"/>
</mapping>
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.9
![Page 10: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/10.jpg)
The function element
<function name="gunate"><rule source="[@(a)@(i)@(u)]"
target="%(guna($1))"/><rule source="[@(f)@(x)]"
target="%(guna($1))%(semivowel($1))"/>
</function>
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.10
![Page 11: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/11.jpg)
Applying a function
6.1.87 ad gun. ah.
<rule source="[@(a)][@(wb)]([@(ik)])"target="!(gunate($1))"ref="A.6.1.87"/>
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.11
![Page 12: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/12.jpg)
Implementing the modeled rules
The XML model captures some of thestructure of Pan. ini’s grammar. But theobvious serial application of the rules iscomputationally inefficient.
The rules can be automatically translated intoregular expressions for compilation into afinite state transducer using tools such asxfst (Xerox) or fsa (van Noord).
The relation between the underlying stringsand the surface strings is a regular relation.
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.12
![Page 13: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/13.jpg)
The replace operator
Rules may be translated into regular expressionsemploying the replace operator (Karttunen 1995).
(a|A)( | #)(a|A) → a(a|A)( | #)(i|I) → e(a|A)( | #)(u|U) → o(a|A)( | #)(f|F) → ar(a|A)( | #)(x|X) → al
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.13
![Page 14: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/14.jpg)
Context-dependent replacement
Documented algorithms exist for the translationof context-dependent replacements into FSTs(Mohri & Sproat 1996).
6.1.109 enah. padantadati
<rule source="a"target="’"lcontext="[@(eN)][@(wb)]"ref="6.1.109"/>
a → ’ / (e|o)( | #)
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.14
![Page 15: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/15.jpg)
An FST for 6.1.109
6.1.109 enah. padantadati
s 0 s 1 s 2e, o
?
?
e, o
, #
e, o
?, a:’
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.15
![Page 16: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/16.jpg)
A composed FST for external sandhi
37 sutras constitute core rules for externalsandhi
XML: 48 rules, 61 macros, 16 mappings, 3functions
compiled regular expressions are ~268KB
composed transducer has 4,994 states,417,814 arcs
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.16
![Page 17: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/17.jpg)
Comparing two approaches
Serial application of rules:
FORM SUTRA
tat ca
tad ca 8.2.39taj ca 8.4.40, 44tac ca 8.4.55tacca
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.17
![Page 18: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/18.jpg)
Comparing two approaches
A unique path through the transducer:
<t:t><a:a><t:c><" ":c><c:ε><a:a>
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.17
![Page 19: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/19.jpg)
Limitations of segmentalism
Segments are atomic, and enumerating themlimits linguistic generalization.
Features overlap segments. It wasJ. R. Firth’s insight that “some phonologicalproperties are not uniquely ‘placed’ withrespect to particular segments within a largerunit” (Anderson, 1985, 185).
Coarticulation “can be detected in almostevery phoneme sequence in normal speech”(Goodglass, 1993, 62).
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.18
![Page 20: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/20.jpg)
Positions of the Indian grammarians
Pan. ini moved beyond the vikara system ofearlier linguistic thinkers (Cardona 1965,311).
Use of abbreviations (pratyaharas) for soundclasses and the principle of savarn. ya (A.1.1.50) emphasize featural analysis.
Segments contain subsegments (e. g. /r
�
/contains r: MBh. 3.452.1 ff.
Pitch is a property of the syllable (R
�
Pr. 3.9) orspreads to adjacent consonants (TPr. 1.43).
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.19
![Page 21: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/21.jpg)
N-retroflexion in finite state modeling
Non-final /n/ is realized as n. after {r
�
, r
�
, r, s. }despite intervening vowels, semivowels,gutturals/velars, labials, or anusvara.
<rule source="n"target="R"lcontext="[fFrz]
[#@(aw)@(ku)@(pu)M]*"rcontext=".*[@(ac)]"ref="8.4.1-2"/>
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.20
![Page 22: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/22.jpg)
N-retroflexion examples
There is a regular relation between a set ofunderlying and surface strings that includes thefollowing pairs:
UNDERLYING SURFACE
br
�
m. hana br
�
m. han. a ‘making big/strong’arabhyamana arabhyaman. a ‘being commenced’nis. anna nis. an. n. a ‘sitting’
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.21
![Page 23: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/23.jpg)
A prosody of retroflexion
When R is projected onto the linear phonematicplane, n. occurs within its extension (Allen 1951,943).
bR
r
�
m. han. a
a-R
rabhyaman. a
ni-R
s. an. n. a
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.22
![Page 24: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/24.jpg)
How to represent length?
/devat/ ([+long] segment)/deva �t/ (phoneme of length)/devaat/ (two phonemes)
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.23
![Page 25: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/25.jpg)
Autosegmental approaches to length
d e v a t
[DBL]
d e v a t
C V C V V C
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.24
![Page 26: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/26.jpg)
Autosegmental implications
“stability” of suprasegmental units (Goldsmith1976)
compensatory lengthening (Latin consul →cosul ; cf. epigraphic COS)Swedish has complementary distribution ofvocalic/consonantal length in rime ofstressed syllables
long vowels are structurally parallel todiphthongs on the CV tier but not on thesegmental tier
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.25
![Page 27: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/27.jpg)
Length in Indian grammar
The Pan. inian Sivasutras specify only five basic
vowels, not distinguishing between short or long
(or pluta) vowels. Pan. ini characteristically refers
to a-varn. a, etc., that is, the a vowel independent
of its length (1.1.69).
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.26
![Page 28: From Pan¯ .inian Sandhi to Finite State Calculusarchimedes.fas.harvard.edu/mdh/sandhi-slides.pdf · From Pan¯ .inian Sandhi to Finite State Calculus Malcolm D. Hyman Max Planck](https://reader030.vdocuments.us/reader030/viewer/2022040308/5f0778c07e708231d41d2637/html5/thumbnails/28.jpg)
The utility of linguistic descriptions
The virtue of particular linguistic descriptionsis substantially relative to their purpose.Linear and non-linear descriptions each haveadvantages.
The As. t.adhyayı is motivated by brevity andexplanatory generality. Computationallinguistics strives for efficiency andexplicitness.
First International Sanskrit Computational Linguistics Symposium, Paris, 2007 – p.27