interlingual word mapping

Post on 12-Feb-2016

43 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

MTP I Stage Project Presentation Guided by- Presented by- Prof. Pushpak Bhattacharyya Abhijeet Padhye Department of Computer Science and Engineering Indian Institute of Technology, Bombay. Interlingual word mapping. Motivation Introduction - PowerPoint PPT Presentation

TRANSCRIPT

MTP I Stage Project Presentation

Guided by- Presented by- Prof. Pushpak Bhattacharyya Abhijeet Padhye

Department of Computer Science and EngineeringIndian Institute of Technology, Bombay

1. Motivation2. Introduction3. Introduction to Transliteration 4. Syllables and their structure types 5. Sonority Theory6. Relation between Sonority and Syllables7. What is Schwa?8. A Sonority theory based Syllabification module9. Results obtained10. References

Language – an integral part of society Each has its specific structure and rules Some basic concepts common to all Helpful in processes like transliteration

ultimately leading to better CLIR. We are trying to exploit them for

process of syllabification

“To study some Phonological similarities between English, Hindi and Marathi and exploit them in order to achieve the goal of transliteration with high accuracy so as to be able to tackle problems like OOV words during Cross-Lingual Information Retrieval.”

Concepts being emphasized Transliteration Theory of Syllables Sonority Theory Their relation Theory of Schwa & Schwa deletion

Mainly based on the properties of Sound Driving force behind word pronunciation

in any language

A process of phonetically “translating” named entities like proper nouns from a source language to a target language.[1]

The process of transliteration should be as accurate as possible.

Faces the problem of multiple variants of words.

“Syllable is a unit of spoken language consisting of a single uninterrupted sound formed generally by a Vowel and preceded or followed by one or more consonants.”

Vowels are the heart of a syllable(Most Sonorous Element)

Consonants act as sounds attached to vowels.

A syllable consists of 3 major parts:- Onset (C) Nucleus (V) Coda (C)

Vowels sit in the Nucleus of a syllable Consonants may get attached as Onset

or Coda. Basic structure - CV

The Nucleus is always present

Onset and Coda may be absent

Possible structures V CV VC CVC

Prominence Theory E.g. entertaining /entəteɪnɪŋ/ The peaks of prominence: vowels /e ə eɪ ɪ/ Number of syllables: 4

Chest Pulse Theory Based on muscular activities

Sonority Theory Based on relative soundness of segment

within words

“The Sonority of a sound is its loudness relative to other sounds with the same length, stress and speech.”

Languages have sounds associated with them Some sounds are more sonorous Words in a language can be divided into

syllables Sonority theory distinguishes syllables on the

basis of sounds.

Defined on the basis of amount of sound associated

The sonority hierarchy is as follows:- Vowels (a, e, i, o, u) Liquids (y, r, l, v) Nasals (n, m) Fricatives (s, z, f,…..sh, th etc.) Affricates (ch, j) Stops (b, d, g, p, t, k)

Obstruents can be further classified into:- Fricatives Affricates Stops

“A Syllable is a cluster of sonority, defined by a sonority peak acting as a structural magnet to the surrounding lower sonority elements.”

Represented as waves of sonority or Sonority Profile of that syllable Nucleus

Onset Coda

“The Sonority Profile of a syllable must rise until its Peak(Nucleus), and then fall.”

Peak (Nucleus)

Onset Coda

ABHIJEET Sonority Profile 1

A I E E H JB T

Sonority Profile 2 A I E E

H JB T

“The Intervocalic consonants are maximally assigned to the Onsets of syllables in conformity with Universal and Language-Specific Conditions.”

Determines underlying syllable division Example

DIPLOMADIP LO MA & DI PLO MA

First alphabet of IAL – {a} Unstressed and Toneless neutral vowel Sanskrit is phonetically perfect – no neutral

vowels Hindi, Bengali etc. allow schwa to be neutral Some schwas deleted and some are not Schwa deletion – important issue for

grapheme to phoneme conversion

1) Saphalya and Amantrana2) Priya and Tritiya3) Kavya and Ashva4) Badhai5) Samuha and Chehara6) Badara and Kalama7) Kalama and Banda

Developed completely in Java Platform independent Tries to perform syllabification of words Rides on the concepts of Sonority

theory – mainly sonority sequencing principle

Makes use of Java’s Hashmap utility to save execution time.

Consists of three major functions:- SonorityHierarchy() syllabify(String word) accuracy() Delete_schwa() [Under Development]

Stores and references the Sonority hierarchy from the hashmap

Tries to find the syllable boundaries according to their sonority profile

Tries to delete schwas present in the input

Syllabification and PRR generation modules implemented

Number of manually syllabified words – 27614 No. of words fed as input – 27614 No. of words correctly syllabified – 26253 Accuracy obtained – 95.86 % for English and

about 70% for Hindi Accuracy of Schwa deletion in English – 77% Schwa deletion for Hindi is under

developement

Problems faced First rule-based implementation failed Some specific consonant and vowel clusters still

result in erroneous syllabification

Future work Schwa deletion for Hindi and Marathi Implementation of Maximal Onset First principle Packaging the above implementation in a stable

transliteration module to be used further in CLIR

1) Giegerich, H. J. 1992. English Phonology. An Introduction.

2) Kahn, Daniel. 1976. Syllable-based generalizations in English phonology.

3) Lass, Roger. Phonology: An Introduction to Basic Concepts. Cambridge University Press, 1984

top related