Download - Sanskrit parser Project Report
![Page 1: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/1.jpg)
SANSKRIT PARSER(Parsing a Sanskrit Sentence in Some
Recognizable Format)
Project Mentor:
Mr. Nikhil DebbarmaAssistant Prof.CSE Dept.NIT,Agartala
Team Members:Akash Bhargava (10UCS002)Ashok Kumar(10UCS010)Laxmi Kant Yadav(10UCS027)Vijay Kumar Gupta(10UCS057)
![Page 2: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/2.jpg)
Translator must know the Grammatical Structure of both Input and Output language.
![Page 3: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/3.jpg)
According to many researchers, Sanskrit is a very scientific language.
Sanskrit behaves very closely as programming language.
So if we are able to make a translator that translates Sanskrit into machine code, then it would prove to be a significant development in the field of NLP(Natural Language Processing).
Why We Chose This Project
![Page 4: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/4.jpg)
Why We Became Interested
“NASA scientist Rick Briggs had invited 1,000 Sanskrit scholars from India for working at NASA. But scholars refused to allow the language to be put to foreign use”- Dainik
Being a computer and human understandable, Sanskrit was considered useful in Space research and many other natural language processing Applications.
![Page 5: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/5.jpg)
ContentWe will first put up some concepts then employ
them -
1. Advantages of using Sanskrit
2. Lexical Analysis
3. Parsing
4. Approach
5. Where we are now.
6. Problems
7. References
![Page 6: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/6.jpg)
Linguistically Sanskrit :- is common base to a large group of Indo-European languages
Limited Vocabulary :- Words represent properties Prefix+Word+Suffix
Fixed Morphology
Concept of Vibhakti
Advantages of using Sanskrit -Why Sanskrit)
![Page 7: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/7.jpg)
Words in Sanskrit belong to 3 categories, namely-
Dhatu Roop – root of all verbsShabda Roop – root of all nounsAvyaya – words with no morphology(indeclinables)
Each word belonging toDhatu Roop has 36 morphed versionsShabda Roop has 21 morphed versionsAvyaya words can represent a single meaning
Fixed Morphology
![Page 8: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/8.jpg)
Vibhakti as Pointer
![Page 9: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/9.jpg)
Consider the Sentence'The man saw the girl with the binoculars.'The man(S) saw(V) the girl(O) with the binoculars(I) ORThe man(S) saw(V) the girl with the binoculars(O)
नरः� द्वि�न�त्र्या बालाम्� अपष्यात्�नरः� द्वि�न�त्री�म्� साकम्� बालाम्� अपष्यात्�
Same is also the reason for UNAMBIGUITY in a sentence. NO effect of shuffling words.
Vibhakti as Pointer
![Page 10: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/10.jpg)
Lexical analysis is the process of converting a sequence of characters into a sequence of tokens
A program or function that performs lexical analysis is called a lexical analyzer, lexer, tokenizer, or scanner
A lexer often exists as a single function which is called by a parser or another function, or can be combined with the parser in scanner less parsing
The lexical analyzer is the first phase of translator. It’s main task is to read the input characters and produces output a sequence of tokens that the parser uses for syntax analysis.
Lexical Analysis
![Page 11: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/11.jpg)
The role of lexical analyzer
Lexical Analyzer
ParserSourceprogram
token
getNextToken
Indexed Database
Output
![Page 12: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/12.jpg)
Output of lexical analysis is a stream of tokens A token is a syntactic category
◦ In English:noun, verb, adjective, …
◦ In sanskrit language:Vibhakti, kriya, vishashena, ..
Parser relies on the token distinctions:
What’s a Token?
![Page 13: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/13.jpg)
An implementation must do two things:
1. Recognize substrings corresponding to tokens2. Search the identified token in the database to
recognize it’s context3. According to the different context it may be different
parts of speech of Sanskrit language eg: verb (kriya), vibhakti (dhatu roop).
4. Every token is tagged accordingly.
Lexical Analyzer: Implementation
![Page 14: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/14.jpg)
Two important points:1. The goal is to partition the string. This is implemented
by reading left-to-right, recognizing one token at a time
2. “Lookahead” may be required to decide where one token ends and the next token begins
◦ Even our simple example has lookahead issues i vs. if = vs. ==
14
Lookahead
![Page 15: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/15.jpg)
Sanskrit's property of FIXED MORPHOLOGY lays thebasis for analyzing individual verbs and nounsprogrammically.
The input word's suffix is analyzed to obtain the following result -
Verbs – Tense,number,personNoun – Sex,number,case
LEXICAL ANALYSIS
![Page 16: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/16.jpg)
LEXICAL ANALYSIS
Consider the dhatu(verb root) त्प� meaning ‘to heat’The following inflections are analyzed lexically -
HEATS WILL HEAT त्पद्वित्, त्पत्�, त्पन्ति�त् | त्प्स्याद्वित्, त्प्स्यात्�, त्प्स्यान्ति�त् | त्पसिसा, त्पथः�, त्पथः | त्प्स्यासिसा, त्प्स्याथः�,त्प्स्याथः | त्पमिम्, त्पवः�, त्पम्� त्प्स्यामिम्, त्प्स्यावः�, त्प्स्याम्�
HEATED HEAT IT(order) अत्पत्�, अत्पत्म्�, अत्पन� | त्पत्�, त्पत्म्�, त्प�त्� | अत्प�, अत्पत्म्�, अत्पत् | त्प, त्पत्म्�, त्पत् | अत्पम्�, अत्पवः, अत्पम् त्पद्विन, त्पवः, त्पम्
![Page 17: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/17.jpg)
LEXICAL ANALYSIS
Consider the noun दे�वः representing GodThe following inclusions are possible
1. Nominative (subject) दे�वः� दे�वः! दे�वः�2. Accusative (object) दे�वःम्� दे�वः! दे�वःन�3. Instrumental (by) दे�वः�न दे�वःभ्याम्� दे�वः#�4. Dative(to) दे�वःया दे�वःभ्याम्� दे�वः�भ्या�5. Ablative(from) दे�वःत्� दे�वःभ्याम्� दे�वः�भ्या�6. Genitive(of) दे�वःस्या दे�वःया$� दे�वःनम्�7. Locative(in) दे�वः� दे�वःया$� दे�वः�षु�
![Page 18: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/18.jpg)
LEXICAL ANALYSIS
Input Sentence
Tokenize
Avyaya Analysis
Verb Analysis
Noun Analysis
Unknown word(add to database)
![Page 19: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/19.jpg)
The scanner recognizes words
The parser recognizes syntactic units Parser operations:
◦ Check and verify syntax based on specified syntax rules
◦ Report errors
Automation:◦ The process can be automated
Parsing
![Page 20: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/20.jpg)
1. Simplicity of design
2. Improving efficiency
3. Enhancing portability
Why to Separate Lexical Analysis and Parsing
![Page 21: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/21.jpg)
Parsing Sanskrit Text
Now we move towards translating a Sanskritsentence into its parser equivalent
PARSING Analyze (a sentence) into its component parts and describe their syntactic roles.
Analyze (a string or text) into logical syntactic components, typically in order to test conformability to a logical grammar.
![Page 22: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/22.jpg)
Parsing Sanskrit Text
Sanskrit Sentence StructureSOV
English Sentence StructureSVO
बाला� पठम्� पठद्वित् Boy reads chapter S O V S V O
![Page 23: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/23.jpg)
Example Sanskrit Sentence
![Page 24: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/24.jpg)
Approach(Coding Concept)
We first tokenize the input using strtok(str,” ”); Each token can be of 3 types- Noun,verb,
preposition.The task is to identify these token which is done by matching in indexed database.
Each token is stored in a structure along with the meaning and its morphologic.
Then parser comes into play and form a tree
type of structure using these tokens.
![Page 25: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/25.jpg)
Bottom-Up Parser Technique
Bottom-Up LR◦ Construct parse tree in a bottom-up manner◦ Find the rightmost derivation in a reverse order◦ For every potential right hand side and token
decide when a production is found
More powerful Bottom-up parsers can handle the largest class of
grammars that can be parsed deterministically
![Page 26: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/26.jpg)
Approach
Programming language used: C and C++ Database Used: Linux file system, indexed Data Structures: Array, Linked List, structure,Tree,
Indexing and Hashing INPUT: A sanskrit sentence or paragraph eg: यात्री रःम्� गच्छद्वित् त्त्री दे�वः� बाला�न साह नदे*म्� द्विनकषु द्वित्ष्ठन्ति�त्! OUTPUT: recognize all the parts of speech Form a tree structure to be able to understand the
sentence.
![Page 27: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/27.jpg)
How the Output Will be Shown in Terminal
यात्री::: this is a avyaya.. and the meaning is: where_there ] रःम्�::: Nominative,Singular, Gender-Masculine ,noun and the root
is: रःम् and the meaning is Ram गच्छद्वित्::: The root is: गच्छ the meaning is: go present-tense,first-
person,singular त्त्री::: this is a avyaya.. and the meaning is: there दे�वः�::: Nominative,Plural Gender-Masculine ,noun ,and the root is:
दे�वः and the meaning is god बाला�न::: Instrumental,Singular, Gender-Masculine ,noun, and the
root is: बाला and the meaning is boy नदे*म्�::: Accusative,Singular, Gender-Feminine ,noun and the root is:
नदे* and the meaning is river
![Page 28: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/28.jpg)
Avyaya's Role in Sanskrit
Avyaya words(indeclinables) are used to connect 2 or more simple sentences. Examples -यादिदे-त्दिदे (if-then)यात्री-त्त्री (where-there)परः�त्� (but)अथःद्विप (hence)चे�दे� (provided,if)Not only do avyaya connect sentences but they also affect structure of a simple sentence.
![Page 29: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/29.jpg)
Challanges in the code
Every word encountered in the input sentence could be any parts of speech of sanskrit as there is no fixed ordering.
Because of the above mentioned property of sanskrit, searching becomes important.
Database and word collection were in unicode format, size of each word becomes even larger.
![Page 30: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/30.jpg)
Problems
Grammar of Sanskrit language
How can we represent it in BNF grammar.
Parser techniques
Structure of code
![Page 31: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/31.jpg)
Where We are Now
A big chunk of our time was invested in research of sanskrit language and its grammar which was quite difficult.
Till now we have implemented lexer part and parser part.
![Page 32: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/32.jpg)
Reference
Sanskrit & Artificial Intelligence — NASAKnowledge Representation in Sanskrit and Artificial Intelligence by Rick Briggs
http://www.vedicsciences.net/articles/sanskrit-nasa.html AI Magazine publishes the importance of Sanskrit
http://www.parankusa.org/SanskritAsProgramming.pdf
http://sanskrit.jnu.ac.in/morph/analyze.jsp
http://en.wikipedia.org/wiki/Sanskrit_verbs
http://en.wikipedia.org/wiki/Sanskrit_grammar
![Page 33: Sanskrit parser Project Report](https://reader033.vdocuments.us/reader033/viewer/2022061208/548b89d9b479599a338b4607/html5/thumbnails/33.jpg)
Thank You