interlingual word mapping

25
MTP I Stage Project Presentation Guided by- Presented by- Prof. Pushpak Bhattacharyya Abhijeet Padhye Department of Computer Science and Engineering Indian Institute of Technology, Bombay

Upload: kira

Post on 12-Feb-2016

43 views

Category:

Documents


0 download

DESCRIPTION

MTP I Stage Project Presentation Guided by- Presented by- Prof. Pushpak Bhattacharyya Abhijeet Padhye Department of Computer Science and Engineering Indian Institute of Technology, Bombay. Interlingual word mapping. Motivation Introduction - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Interlingual  word mapping

MTP I Stage Project Presentation

Guided by- Presented by- Prof. Pushpak Bhattacharyya Abhijeet Padhye

Department of Computer Science and EngineeringIndian Institute of Technology, Bombay

Page 2: Interlingual  word mapping

1. Motivation2. Introduction3. Introduction to Transliteration 4. Syllables and their structure types 5. Sonority Theory6. Relation between Sonority and Syllables7. What is Schwa?8. A Sonority theory based Syllabification module9. Results obtained10. References

Page 3: Interlingual  word mapping

Language – an integral part of society Each has its specific structure and rules Some basic concepts common to all Helpful in processes like transliteration

ultimately leading to better CLIR. We are trying to exploit them for

process of syllabification

Page 4: Interlingual  word mapping

“To study some Phonological similarities between English, Hindi and Marathi and exploit them in order to achieve the goal of transliteration with high accuracy so as to be able to tackle problems like OOV words during Cross-Lingual Information Retrieval.”

Page 5: Interlingual  word mapping

Concepts being emphasized Transliteration Theory of Syllables Sonority Theory Their relation Theory of Schwa & Schwa deletion

Mainly based on the properties of Sound Driving force behind word pronunciation

in any language

Page 6: Interlingual  word mapping

A process of phonetically “translating” named entities like proper nouns from a source language to a target language.[1]

The process of transliteration should be as accurate as possible.

Faces the problem of multiple variants of words.

Page 7: Interlingual  word mapping
Page 8: Interlingual  word mapping

“Syllable is a unit of spoken language consisting of a single uninterrupted sound formed generally by a Vowel and preceded or followed by one or more consonants.”

Vowels are the heart of a syllable(Most Sonorous Element)

Consonants act as sounds attached to vowels.

Page 9: Interlingual  word mapping

A syllable consists of 3 major parts:- Onset (C) Nucleus (V) Coda (C)

Vowels sit in the Nucleus of a syllable Consonants may get attached as Onset

or Coda. Basic structure - CV

Page 10: Interlingual  word mapping

The Nucleus is always present

Onset and Coda may be absent

Possible structures V CV VC CVC

Page 11: Interlingual  word mapping

Prominence Theory E.g. entertaining /entəteɪnɪŋ/ The peaks of prominence: vowels /e ə eɪ ɪ/ Number of syllables: 4

Chest Pulse Theory Based on muscular activities

Sonority Theory Based on relative soundness of segment

within words

Page 12: Interlingual  word mapping

“The Sonority of a sound is its loudness relative to other sounds with the same length, stress and speech.”

Languages have sounds associated with them Some sounds are more sonorous Words in a language can be divided into

syllables Sonority theory distinguishes syllables on the

basis of sounds.

Page 13: Interlingual  word mapping

Defined on the basis of amount of sound associated

The sonority hierarchy is as follows:- Vowels (a, e, i, o, u) Liquids (y, r, l, v) Nasals (n, m) Fricatives (s, z, f,…..sh, th etc.) Affricates (ch, j) Stops (b, d, g, p, t, k)

Page 14: Interlingual  word mapping

Obstruents can be further classified into:- Fricatives Affricates Stops

Page 15: Interlingual  word mapping

“A Syllable is a cluster of sonority, defined by a sonority peak acting as a structural magnet to the surrounding lower sonority elements.”

Represented as waves of sonority or Sonority Profile of that syllable Nucleus

Onset Coda

Page 16: Interlingual  word mapping

“The Sonority Profile of a syllable must rise until its Peak(Nucleus), and then fall.”

Peak (Nucleus)

Onset Coda

Page 17: Interlingual  word mapping

ABHIJEET Sonority Profile 1

A I E E H JB T

Sonority Profile 2 A I E E

H JB T

Page 18: Interlingual  word mapping

“The Intervocalic consonants are maximally assigned to the Onsets of syllables in conformity with Universal and Language-Specific Conditions.”

Determines underlying syllable division Example

DIPLOMADIP LO MA & DI PLO MA

Page 19: Interlingual  word mapping

First alphabet of IAL – {a} Unstressed and Toneless neutral vowel Sanskrit is phonetically perfect – no neutral

vowels Hindi, Bengali etc. allow schwa to be neutral Some schwas deleted and some are not Schwa deletion – important issue for

grapheme to phoneme conversion

Page 20: Interlingual  word mapping

1) Saphalya and Amantrana2) Priya and Tritiya3) Kavya and Ashva4) Badhai5) Samuha and Chehara6) Badara and Kalama7) Kalama and Banda

Page 21: Interlingual  word mapping

Developed completely in Java Platform independent Tries to perform syllabification of words Rides on the concepts of Sonority

theory – mainly sonority sequencing principle

Makes use of Java’s Hashmap utility to save execution time.

Page 22: Interlingual  word mapping

Consists of three major functions:- SonorityHierarchy() syllabify(String word) accuracy() Delete_schwa() [Under Development]

Stores and references the Sonority hierarchy from the hashmap

Tries to find the syllable boundaries according to their sonority profile

Tries to delete schwas present in the input

Page 23: Interlingual  word mapping

Syllabification and PRR generation modules implemented

Number of manually syllabified words – 27614 No. of words fed as input – 27614 No. of words correctly syllabified – 26253 Accuracy obtained – 95.86 % for English and

about 70% for Hindi Accuracy of Schwa deletion in English – 77% Schwa deletion for Hindi is under

developement

Page 24: Interlingual  word mapping

Problems faced First rule-based implementation failed Some specific consonant and vowel clusters still

result in erroneous syllabification

Future work Schwa deletion for Hindi and Marathi Implementation of Maximal Onset First principle Packaging the above implementation in a stable

transliteration module to be used further in CLIR

Page 25: Interlingual  word mapping

1) Giegerich, H. J. 1992. English Phonology. An Introduction.

2) Kahn, Daniel. 1976. Syllable-based generalizations in English phonology.

3) Lass, Roger. Phonology: An Introduction to Basic Concepts. Cambridge University Press, 1984