sanskrit linguistic processing character-encoding, morphology, and lexicography peter m. scharf...

Sanskrit Linguistic Processing

Character-encoding,morphology,

and lexicography

Peter M. ScharfBrown University

23 December 2009

Peter M. Scharf, 23 Dec. 2009: 2

Roman-based Standards


Devanagarī-based Standards


Nominal inflection


Verbal inflection


Vedic Unicode


Encoding Vedic Characters

The Vedic Unicode Proposal recommends the addition of Vedic characters to the Unicode standard so that tone marks that appear in red in this palmleaf manuscript of the Vājasaneyisaṃhitā may be accurately represented in print.


Vedic Unicode Charts


Devanāgarī Extended


Vedic Extensions


LIES Appendix B

The Sanskrit Library Phonetic Basic encoding scheme (SLP1) attempts to meet high standards of unambiguous encoding while restricting encoding to 75 codepoints in the ASCII character set. SLP1 utilizes 57 codepoints to encode segments: 53 to represent phonetic segments and four to represent punctuation. In addition SLP1 utilizes 18 codepoints to encode phonetic features: three to indicate stricture, five to indicate length, eight to indicate tone, and one to indicate nasalization….


SLP1Basic Segments


B.3 Modifiers

Modifiers are added after a character to indicate variations in segment stricture, length, accent, and nasalization, in the order stated. Prolonged length, accent, and nasalization occur in classical Sanskrit as well as Vedic. Modifiers are used in combination to indicate special features of stricture, length, accent, and nasalization in Vedic.


B.3.1 Stricture

_ heaviness [used for semivowels y or v] = lightness [used for semivowels y or v]

! lack of release (abhinidhāna)[used for stops or semivowels y, v, or l]


B.3.2 Length* subsegmental epenthetic vowel (svarabhakti)

# length of half a mora

1 length of one mora [used in Vedic after shortagitated kampa; short e, o; and heavy anusvāra]

1# slightly lengthened

2 length of two morae [used for dvimātra anusvāra inVedic]

3 prolonged length of three morae [used for plutavowels]

4 prolonged length of four or more morae [used inraṅga]


B.3.3 Accent

/ high pitch

\ low pitch

^ circumflex

6 extra low tone

7 low tone

8 high tone

9 extra high tone

+ sharpness


B.3.4 Nasalization~ nasalization

Yamas

20 epenthetic nasalized segments:k~, kh~, . . . , b~, bh~

4 four epenthetic nasalized segments:k~, kh~, g~, gh~

20 replacements for a non-nasal stop before a nasal: k~, kh~, . . . , b~, bh~ (Ṛkprātiśākhya)


B.4.4 Syllabified visarga and anusvāra accent

H/ high-pitched visarga

H\ low-pitched visarga

H^ svarita visarga

M\ low-pitched anusvāra


Nominal Declension


Verbal Conjugation


XML Rules

for guṇa


ExecutablePerl code


XMLFull-form

Lexicon


Morphological Analyzer


Cologne Digital Sanskrit Dictionaries


CDSL Monier Williams


Digital Dictionaries of South Asia

Digital Sanskrit Library Integration

Flexible input and display,linking text to the full-form lexicon,

and aligning inflectional and morphological tags


Sanskrit Library Text-

lexicon Integrat

ion


Sanskrit

Library

Morpho-

logical

Analysis


Monier Williams: anuttama


Sanskrit Library Input/Display Preferences


Sanskrit Library Lexical Sources Preferences


Böhtlingk’sSanskrit-Wörterbuch in

kürzerer Fassunganuttama


Böhtlingk and Roth’sGrosses Sanskrit-Wörterbuch

anuttama


Apte'sPractical Sanskrit-English

Dictionaryanuttama


Macdonell'sA Practical Sanskrit

Dictionaryanuttama

Sanskrit Linguistic Processing

Text-image alignment,and digital critical editing


Monier Williams Digital Image


Machine-readable text

Below is a segment of Ṣaḍguruśiṣya’s Vedārthadīpikā in SLP1 encoding.


Syllable Tags

Below is a segment of Ṣaḍguruśiṣya’s Vedārthadīpikā with orthographic syllable XML tags inserted.


Variant Readings

An XML file contains variant readings for various manuscripts and editions of Ṣaḍguruśiṣya’s Vedārthadīpikā.


Page Boundaries

An XML file of entries associates page boundaries in the manuscript Wai321 of Ṣaḍguruśiṣya’s Vedārthadīpikā with orthographic syllable tags in the machine-readable edition and in manuscript variants tags.


Word-spotting

A highlighted passage in a manuscript of Ṣaḍguruśiṣya’s Vedārthadīpikā: Wai321, folio 131, recto, line 8.


VAD Digital Critical Edition

sanskrit linguistic processing character-encoding, morphology, and lexicography peter m. scharf...

Documents

vedic unicode slide

lexicography peter

kprtikhya slide

vedic extensions slide

nominal inflection slide

verbal inflection slide

nominal declension slide

scharf brown university