sanskrit and computer studies©...

23
___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected] Sanskrit and Computer Studies © 1 Dr. B V Venkatakrishna Sastry Professor, Hindu University of America Orlando,Florida 32825 Director, International Sanskrit Research Academy E-mail: [email protected] www.hua.edu ; www.mysanskrit.com ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in multiple historic perspectives. This paper is more of concept –road map than specific code segment presentation and solutions demonstration. Starting with a perspective presentation of historic developments, the paper leads to future potential held by Saṃskṛtam, addressing lead questions like: What is Saṃskṛtam? How Pāṇinian Saṃskṛtam should address the current frontiers of explorations in (a) computational linguistics related research (2) Human machine interface in multilingual multimodal way and (3) related communication standards in digital space. The presentation highlights the current complexities faced by approaching Sanskrit studies on the model of script centricity in computers, leading to inaccurate representation of Saṃskṛtam language traditions documented in multiple indic scripts. The intention is to invite support for Saṃskṛtam studies in a two fold strategic way for research and application 1 deployment. The path envisaged is furthering the concept of Saṃskṛtam inspired Programming language interface to generate new models for (i) voice programming (ii) multilingual spoken language interfaces for web (iii) security of data with measures for better encryption (iv) improvisation in OCR –Text to Speech and Speech to Text solutions (v) Human-machine interactivity for Intelligent systems / ambient intelligence based support, empowering smartness. The projected, significant global benefits will be in the following areas: (a) Delivering Saṃskṛtam inspired leadership in the Information –Science technology related Research - and Standardization of Human Machine Communication interfaces to claim the right space for Brāhmī based languages of India in the digital space (b) Restore and Deliver the benefit of Saṃskṛtam for the ‘voice only - enabled common communicator (challenged by multilingual script literacy and computer illiteracy) through the path of voice enabled programming language standards – interfaces deployed in community utility devices and applications through technology gateway. Outline of Presentation 1. Conceptual Clarification of terms Saṃskṛtam – Computers and the link word ‘and’ (-Defining the scope of relation for the purpose of this presentation.) 1.1 Saṃskṛtam –Sanskrit ( Monier Williams as check point ) 1.2 Computers – Technical and popular perception 1.3 Relation of Saṃskṛtam –Sanskrit –Computers – over view of a paradigm shift 2. Modelling of Saṃskṛtam for computer studies: Historic foot prints, transliteration systems, IPA Alphabtet Chart v/s Shiva sutra, Preamble of UVC- the move from Unicode to Universal voice code. 3. ‘Saṃskṛtam for Computers’ – Unicode to Universal Voice code: Futuristic trends, Challenges to IPA notations 4. Promoting Saṃskṛtam Language technology research- Shift in paradigm 5. Future potential – Samskrutham – Future SQL for Voice Programming 6. Sanskrit and Computers – Conclusion 1 This paper is prepare for reading at Sree Shankaracharya University of Sanskrit, Kalady, Ernakulam , kerala. (http://ssus.ac.in ) for the project The Erudite Scholar in residence. Sessions Jan 19/20,2010. An enhanced monograph version of this article is planned by the author for a later date.

Upload: vocong

Post on 25-May-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

Sanskrit and Computer Studies© 1

Dr. B V Venkatakrishna Sastry Professor, Hindu University of America

Orlando,Florida 32825 Director, International Sanskrit Research Academy

E-mail: [email protected] www.hua.edu ; www.mysanskrit.com

ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in multiple historic perspectives. This paper is more of concept –road map than specific code segment presentation and solutions demonstration. Starting with a perspective presentation of historic developments, the paper leads to future potential held by Saṃskṛtam, addressing lead questions like: What is Saṃskṛtam? How Pāṇinian Saṃskṛtam should address the current frontiers of explorations in (a) computational linguistics related research (2) Human machine interface in multilingual multimodal way and (3) related communication standards in digital space. The presentation highlights the current complexities faced by approaching Sanskrit studies on the model of script centricity in computers, leading to inaccurate representation of Saṃskṛtam language traditions documented in multiple indic scripts. The intention is to invite support for Saṃskṛtam studies in a two fold strategic way for research and application1 deployment. The path envisaged is furthering the concept of Saṃskṛtam inspired Programming language interface to generate new models for (i) voice programming (ii) multilingual spoken language interfaces for web (iii) security of data with measures for better encryption (iv) improvisation in OCR –Text to Speech and Speech to Text solutions (v) Human-machine interactivity for Intelligent systems / ambient intelligence based support, empowering smartness. The projected, significant global benefits will be in the following areas: (a) Delivering Saṃskṛtam inspired leadership in the Information –Science technology related Research - and Standardization of Human Machine Communication interfaces to claim the right space for Brāhmī based languages of India in the digital space (b) Restore and Deliver the benefit of Saṃskṛtam for the ‘voice only - enabled common communicator (challenged by multilingual script literacy and computer illiteracy) through the path of voice enabled programming language standards – interfaces deployed in community utility devices and applications through technology gateway.

Outline of Presentation

1. Conceptual Clarification of terms Saṃskṛtam – Computers and the link word ‘and’ (-Defining the scope of relation for the purpose of this presentation.)

1.1 Saṃskṛtam –Sanskrit ( Monier Williams as check point ) 1.2 Computers – Technical and popular perception 1.3 Relation of Saṃskṛtam –Sanskrit –Computers – over view of a paradigm shift

2. Modelling of Saṃskṛtam for computer studies: Historic foot prints, transliteration systems, IPA Alphabtet Chart v/s Shiva sutra, Preamble of UVC- the move from Unicode to Universal voice code. 3. ‘Saṃskṛtam for Computers’ – Unicode to Universal Voice code: Futuristic trends, Challenges to IPA notations 4. Promoting Saṃskṛtam Language technology research- Shift in paradigm 5. Future potential – Samskrutham – Future SQL for Voice Programming 6. Sanskrit and Computers – Conclusion

1 This paper is prepare for reading at Sree Shankaracharya University of Sanskrit, Kalady, Ernakulam , kerala. (http://ssus.ac.in) for the project The Erudite Scholar in residence. Sessions Jan 19/20,2010. An enhanced monograph version of this article is planned by the author for a later date.

Page 2: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

Sanskrit and Computer Studies©2

Dr. B V Venkatakrishna Sastry Professor, Hindu University of America

Orlando,Florida 32825 Director, International Sanskrit Research Academy

E-mail: [email protected] www.hua.edu ; www.mysanskrit.com

1. Conceptual Clarification of terms Saṃskṛtam – Computers and the link word ‘and’ (- Defining the scope of relation for the purpose of this presentation.)

1.1 Saṃskṛtam –Sanskrit ( Monier Williams as check point ) The word ‘Saṃskṛtam’2 is a technical term which occurs in Pāṇinian grammar3. It is

used to indicate the name of a language, and ‘language processes’ (= conventions and regulations of communication). The focus on ‘language-process’ gives the rules of packaging and transforming abstract human communication to a tangible voiced expression. The output yield package, the Saṃskṛta pada/ vākya holds in it < coding instructions - content –decoding guidelines> associated with communication/intent of the speaker in a context. In short, Saṃskṛtam is more than common parlance ordinary natural language expression; it is an intentionally refined communicative expression for achieving compliance with standard norm of True Expression = satya-vak = ‘Total, True,4 seamless equivalence of intent in mind, packaged in to a spoken expression, consciously by the speaker’. The result is ‘an expression which transcends the deteriorating influence of time-space-speaker limitation. Saṃskṛtam carries a unique model of sentence structure, permitting a random word order based construction’.

Monier Willaims notes in his introduction to dictionary that Sanskrit, the anglicized form

for Saṃskṛtam is used for convenience. And this has prevailed over two hundred plus years in usage.

For the purpose of this article: Saṃskṛtam refers to the traditional model of language expression, bound by Pāṇinian rule base, regulated by the authority-tradition of ‘muni-traya’ 5. The format of this is what we have in Siddhānta Kaumudi of Bhattoji Dīkşita.

The word Sanskrit will be used to denote that finite flavor of Saṃskṛtam which (a) is

studied as a historical classical language of Bhārath (b) a societal language for studying culture of Bhārath in a socio-religious historic frame (c ) Saṃskṛtam, which is preferentially scripted using nagari / Devanāgarī script linked to a non-roman script writing convention. (d) 2 This paper is prepare for reading at Sree Shankaracharya University of Sanskrit, Kalady, Ernakulam , kerala. (http://ssus.ac.in) for the project The Erudite Scholar in residence. Sessions Jan 19/20,2010. An enhanced monograph version of this article is planned by the author for a later date.

Page 3: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

studied using a non-traditional, non-native format of analysis, postulating historic strata of language evolution, by over ruling the primary authority of ‘muni traya’ for Saṃskṛtam language regulation (e) an approach which treats ‘Pāṇinian grammar regulated language as classical Sanskrit (distinct from Vedic Sanskrit) – a socio historical antiquity maintained by Indians as a legacy of tradition for socio-religious purposes (f) a language which has no specific societal use beyond spiritual ritual prayers. The only beauty of this historical language recognized is (i) its firm grammar structure coming from the pre-Christian era of Indian history (ii) has an extensive rule base providing unlimited scope for generating usage-vocabulary (iii) a language which provides output in the form of sentences which allow for a random word order/verb centric construction.

Saṃskṛtam studies demand (i) agreement to the primacy of voice over script (upadeśa,

pratijñā) critical for training to create /deliver refined expression of voice units (Śikśhā –Vyākaraṇa regulation) in the usage for human communicative expression-engagements. Sanskrit studies adapt a primacy of scripted text, read truly as scripted, textual authority recognized by accurately and critically edited historic manuscript tradition, to recognize the True documentation of human idea – thought process - logic of communication. The primacy of Shruti (Voice tradition) over lipi – hastaprati lekhana scripted document marks a critical difference between these two models for analysis.

Saṃskṛtam is a language which demands accuracy of process – delivery to the extent

of a ‘half unit of voice atom = ardha mātrā kāla ’ and ‘one accent frame = svara nirduşhtatā’. The ‘voice atomic units change in word formation and usage are rule regulated; the changes are not due to ‘incapability of pronunciation 6 or slackness’. Such defects of communication formation and expression ‘- dośa, duştatā - are identified clearly and shunned. Sanskrit studies have allowed and introduced a free wheeling scenario in this case, resulting in multiple distortions of not only words, but also of process – word splits in violation of primary grammar directives 7 , practicing tradition and the rule base, ‘just to help a non-native student make research in indic traditions’!? The short comings of adopting this model of Saṃskṛtam lexicon building is available in Monier Williams Dictionary (a model which Amara kosha or any of the traditional Saṃskṛtam dictionaries would just reject outright!). The digital circulation of this dictionary is at the base of several application developments of Sanskrit –computers related work.

With this, we have now, clarity to address the relation between ‘Saṃskṛtam and

Computers’ distinct from ‘Sanskrit and Computers’. The first model addresses the ‘Voice, Voiced expression related studies’ form Saṃskṛtam – refined communication perspective. This is study of human language expression8 in relation to the coding-decoding of communication and computation in this frame of thought –expression relation>.

The second model addresses the ‘Scripted-Character cluster sequence combination’.

Sequence constructed expression’ is the primary/base unit. This is form Sanskrit perspective for < language – computation relation>. This is the current on going model of Sanskrit for Computers, guiding Sanskrit Computational linguistics. The goal is to understand how to extract the communication from a Sanskrit Sentence = Scripted Character Cluster (lipi samketa samooha roopa –vakyasya arthaavagamanam) using to a grammar rule base.

Page 4: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

We have now identified two perspectives of study: 1. Sanskrit perspective to which is

guiding the current understanding and studies in the interest of <language-computation -relations> studies. This is a specific language centric analytical approach. 2 Saṃskṛtam perspective to understand the futuristic potential of < language-communication relation> studies. This is a open ended, inclusive study wherein a challenging environment of extracting communication from a speech expression in a multi user – multi mode – multi lingual environment is being analyzed. The deployment of results of such analysis gets in to applications where hybrid communication processing in real time for intelligent systems interfacing is demanded. This is a communication centric Standards development approach for interfaces.

1.2 Computers – Technical and popular perception

The word ‘Computers’ in its popular use, refers to several devise that use (a) ‘computation program-power of the chip at its core (b) involving operating systems, software’s and applications built around a specific standards and (c ) associated technologies integration which provide comforts related to input –output – reach out – storage – retrieval –documentation-search by defined fields and operators. Thus, a word processing machine is a computer in common users perception and ‘washing machine’ is not a computer!

The technical user links the word ‘computer’ to any device that computes / works on the basis of computational logic, using a standard-convention, permitting useful interaction between humans and machines, using ‘computing intelligent instruction called a program’. Thus, for the technical user, washing machine using a programming cycle for setting the clothe wash cycle is as much a computer as a dash board digital instrument on the panel of a satellite in space orbit. The focus of the computer technologist is analysis of how to port the human communication –intelligence-logic model to the human-machine model? how to make the routine instructions / logical works delegated to the machine domain, using a computational logical program. The computation analyst is interested in analyzing the communication logic and process of the given language usage scenario.

It is here, that we see more clarity of the above two models: Saṃskṛtam and

Computers /Sanskrit and Computers. 1.3 Relation of Saṃskṛtam –Sanskrit –Computers – over view of a paradigm shift

When it comes to the machine domain, there is a subtle yet critical distinction of language and communication based interface-modeling. The following environmental frames are presented to facilitate Saṃskṛtam:Sanskrit: Computer relation frames. Saṃskṛtam posits four frames of interface modeling for exploration, as below.

1. (Human to Human) Languages 2. (Human to Divine) Languages 3. (Human to Machine) Languages 4.(Machine to Machine) Languages

Page 5: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

Each of these frames requires a different model of language –computation-grammar analysis for processing. The table below is a representation of wide spectrum of possibilities here.

(A)Divine (B)Human (C )Machine 1. Human Human-Divine Human-Human Human-Machine

(Content Creation) By State of

Consciousness 1.Waking 2.Dreaming 3.Sleeping 4.Transcendental

1. Social -Historic - Land specific usage scenario 2. Technical Scientific 3. Business 4. Literary 5.Spiritual

1.Contextual Scripted conventions as input 2. Voice mode input 3. Pointing device –other models of input/ Expected output

2. Machine

Machine-Divine /Super machine ?!

Machine –Human (Output utility driven)

Machine-Machine (A-laukika= Non-Natural, Technical)

Traditional domain of magical Yantras-Mantras; Modern thinkers keep a distance from this model.

1. Conversion /Re-Conversion of Input –output interfaces using standard conventions and pre-agreed standards like ASCII, UNICODE .

Process oriented signal related as suits the matter-process centric system. Within this, a specific model of electrical signal processing is of interest. The human language based communication is also voice based. In the present scenario, Computer language technology schools have been exploring / using roman alphabet character centric script founded interfacing, and International Phonetic association developed model of phonemic classification.

In the above frame, we can see where Sanskrit and Computers :: Saṃskṛtam and

Computers find space. o The common user interest is in the block (1-A, 2-A). o The academic scholar interest is in the block (1-B, 2-B) o The technical user interest /computational linguist’s are in the block (1-C, 2-C).

The futuristic potential of Saṃskṛtam is in the Integration of totality integration of the (B and C columns in total). This is the vision of UVC (Universal Voice code initiative).

Page 6: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

Within Pāṇini, the linguistic communication paradigm for the above table, has following features: -Voice centric analysis: This is of two types: (a) Language of revelation (Chandas) related to higher level of consciousness. Goal is transcending the limitation of the senses of the speaker (= Atīndriya, Turīya) (b) Language of truthful expression of intent, from the speakers perspective, which yields an output which defies the time-space related aging – death –decay-deterioration by usage ; a special process by which human expression becomes immortalized; the immortalizing, divinizing speakers expression (martya vānī to A-martya- vānī, amruta – vānī, deva-bhāşhā, satya-vāk).In this sense, Saṃskṛtam is a artificial language, and Saṃskṛtam grammar is a powerful word generating engine with special qualities. A simpler articulation of this technicality, may be made as follows:

{ Saṃskṛtam, is a process grammar, the refining –packaging-programming set of rules - which provides a way of packaging any given communication, - accommodative of multiple language-vocabulary –word meaning tags - in a perfect way, compliant to the ‘Truth Equivalence Standard’ - ready for a portability without deformation due to time-space constraints - in a, fault tolerant delivery mode - along with self regulating-correcting guidelines - supplemented with a process to recover the packaged communication with all its

integrity - without any recourse to the source which created this communication package.

The inspirational deployment of these concepts, need to be brought in for the

specific paradigm of Human-Machine communication paradigm, in a non-refined spoken language system and voice mode (Prākrita bhāşā). This is the challenge before traditional schools and current researchers looking at Pāṇinian grammar for Machine language interface development.

The very basic questions that arise here would be: -What is Human unique / Human Distinct from Machine –Matter processes perspective? -How is functionality of human mind - bio neural memory unit different from the magnetic matter / electric filed memory unit in a silicon chip? -What is language? -What is common to all human languages? By communication? By Voice units ? -How to seamlessly extract universal communication from multilingual expressions following different grammar rule bases from a given valid expression ? - What are the primary voice atoms to be positioned at the standards? How are secondary’s formed ?How many of them? In what sequence and order ? The sequential order for sorting, the rules of voice-atoms combination (varṇa mālā with svara-vyañjana vibhāga, gunita and samyukta akşhara, Sandhi rules, Svara-Accent rules, voice modulation rules) - Word and word modifier building from primary voice atoms (Pada, Pakruti and Prātipadika, Dhātu, Pratyaya listing, meaning association for cluster of base units, the communication modifiers and markers called the pratyayas, rules for marking the

Page 7: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

relation and construction order – kāraka –vibhakti anvaya, closing of one unit of sentence boundary) -How to integrate multiple layered communication paradigm which humans adapt so flexibly in communication in a human-machine interaction model? In short – What is Saṃskṛtam? How is it unique from all other languages of the world as a contender to claim a special attention to be the best candidature for computer programming language? The answers for these questions rests in the proper conceptualization of traditions and technicalities documented in Pāṇini. The clarity of technical words like the following is critical. Saṃskṛtam, Chandas, bhāşā, Varṇa, Sandhi, vibhakti, Samhitā, Samāsa, Kāraka, Dhātu, Prātipadika, Pratyaya, Anvaya –samanvaya krama, Ākānkshā, pada-padartha vibhāga, avyaya, upasarga, nipāta – The current definitions of these technical words need to be freed from the Grekko Latin grammar modeling and redefined in their traditional technical manner.

The next step for Saṃskṛtam and Computer studies is to contemplate and build the scaled downsized customized model of human-machine interface, workable for relatively less refined natural spoken languages in comparison to Saṃskṛtam - the reference standard 9 for human voiced communication standards.

The next step is porting the above communication from language centric

understanding and work to technology standards. In the current state of the art, the computing devices with English language interface, language transparent interfaces by symbols and pointing devices, touch screens, video –visuals have set a standard of expectation and convenience. The new model needs to compete on this point to be useful to a minority society which has inherited Saṃskṛtam; but to a global society / generation who do not know anything about Sanskrit! And use a different script conveniently, with a head start of several technology-man-years and dedicated work, supported by working installations and fund suport. This involves a study of how the current machine interface programs are designed; how they are programmed; and how can their work be improved without the threats of obsolescence. Work needs to be done to integrate /build devices that deliver useful implementations with comfort, cost advantage, ease of maintenance and service for balanced wealth and welfare. It is here that ‘Saṃskṛtam based machines need to be planned -designed –developed- tested and deployed in a cultured way with Samskruthii and not for destruction!

2. Modeling of Saṃskṛtam for computer studies: 2.1 In the historic Perspective, the native study of Saṃskṛtam is built on a voice primary –

storage model on human memorization and grammar rule complaint usage. Text /manuscript were only a secondary support model. The historic script evolution provided local conveniences for memory supplementation; Efforts were made to represent the spoken sound as accurately as possible in the machine media – be it hand writing,

Page 8: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

typography, type writer, metal cast unit and modern soft fonts. The shift has become clear: the sound linked to ear has yielded the primacy to the visual shape representation - script in technology media10. The scripting by hand on specific writing media has yielded to the ‘keying in’ process. The logistics of achieving the visual display of the character graphic is no more the art –skill of hand / the calligrapher’s pride!

2.2 The significant reference point here is the effort of International phonetic association (IPA) 11 to device a notation system to represent the sounds of non-roman script by usage of special symbols to facilitate printing. With increasing support for print media publication of Saṃskṛtam resources in roman script, in the west and under colonial rule, the IPA Diacritic notation system gained greater academic and commercial recognition. In India, the Devanāgarī script, had already set its stamp as the standard for scripting Saṃskṛtam. Devanāgarī script related printing was also fairly well established for other Indian major languages like hindi.

2.3 In this backdrop, Sanskrit script representation in computers followed two streams: The IPA diacritic notation system for roman character based scripting conventions; Devanāgarī based native character display system for native model of publications. (I am not digressing to the variants of nāgari scripts, regional script documented Sanskrit texts, Tibetan script /Gurumukhi based Nāgari models / Different language specific conventions of reading the nāgari script et al). This deliberation did not make any significant impact on publication in other indic languages, due to commonality of phonetic units. To notice some thing of significance in this perspective, one needs to wait for the news paper revolution in India and digital typing). The only challenging issue related to vedic accents - a special case, was handled ingeniously through special symbols12 as available in print media and typographic skills.

2.4 The institutions in US, like Summer Institute of linguistics 13 have studied Sanskrit as a part of world languages for a longer time. This study has also guided the technology initiatives to take a look at accommodating devanāgarī Sanskrit script in computers as a ‘display’.

2.5 Indian technology initiatives resulted 14 in final announcement of a standard ISCII 13194 in 1991. this covered devanāgarī Sanskrit for computers, with some coverage of vedic accents also. (http://www.tdil.mit.gov.in/standards.htm ) Earlier to this, there were other technologies from Mac and proprietary models which addressed the display of Devanāgarī script in computers with certain constraints. With the new standards of Vedic Unicode (how so ever imperfect) issued by Unicode consortium and the Government of India initiative /directive for substituting the ISCII with Unicode, the situation for Sanskrit has become relatively better in the digital space.

2.6 With the advancements in computer related hardware and Operating systems, a new surging interest began in exploring non-roman scripts representation in computers. The global language markets by scripts opened up seeking a proper representation of national languages in computers. The non-roman script computing was the new initiative15 at Summer Institute of languages. The new standards for world languages were deliberated in the technology parlance. The push for technologies which can accommodate world writing systems in all languages of the world was also pushed by religious institutions16 for promotional work. All this lead to the emergence of a new standard : UNICODE with the following mission statement:

Page 9: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

http://www.unicode.org/ -Welcome! The Unicode Consortium enables people around the world to use computers in any language. Our members develop the Unicode Standard, Unicode Locales (CLDR), and other standards. These specifications form the foundation for software internationalization in all major operating systems, search engines, applications, and the Web. In short, UNICODE Focus < specifies the representation of text in modern software products and standards > . With this perspective, Sanskrit is seen as a scripted language that needs an appropriate representation of its script-ext in digital domain. The UNICODE standards for Devanāgarī Sanskrit is represented at the standards pages: http://www.unicode.org/charts/PDF/U0900.pdf ; http://www.unicode.org/~emuller/southasia/vedic/#introduction http://unicode.org/~emuller/southasia/vedic/07397-vaidika-evidence.pdf Unicode transforms itself to be at the foundational roots of W3C organization and approach. These statements make this situation clear: “The W3C was founded to develop common protocols to lead the evolution of the World Wide Web. The path W3C follows to making text on the Web truly global is Unicode. Unicode is fundamental to the work of the W3C; it is a component of W3C Specifications, from the early days of HTML, to the growing XML Family of specifications and beyond.” —SIR TIM BERNERS-LEE, KBE (Web Inventor and Director of the World Wide Web Consortium (W3C) “Over the past two decades, Unicode has become one of the most important global standards in digital typography. Unicode 5.0, with its greatly increased range, will be of tremendous benefit to software developers involved with text processing, including font designers, application developers, web browser developers, and operating system manufacturers. Computer users around the world, including scholars, librarians, and scientists, as well as general users, will likewise benefit from broad adoption of the Unicode Standard, which has become an essential component of world literacy in the digital age.” —CHARLES BIGELOW, Cary Professor of Graphic Arts Rochester Institute of Technology

2.7 With this approach, the focus being ‘representation of Script visual form of Devanāgarī Sanskrit in the digital page, several ‘transliteration systems, notations, conventions also developed. Some of them are noted here. With so many conventions, the user needs knowledge of a multiple set of conventions to read a given publication properly :

There are several methods of transliteration from Devanāgarī into Roman scripts. The most widely used transliteration method is IAST. However, there are other transliteration options. The following are the major transliteration methods for Devanāgarī: ISO 15919- A standard transliteration convention was codified in the ISO 15919 standard of 2001. It uses diacritics to map the much larger set of Brāhmīc graphemes to the Latin script. See also Transliteration of Indic scripts: how to use ISO 15919. The Devanāgarī-specific portion is nearly identical to the academic standard for Sanskrit, IAST.

Page 10: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

IAST - The International Alphabet of Sanskrit Transliteration (IAST) is the academic standard for the romanization of Sanskrit. IAST is the de-facto standard used in printed publications, like books and magazines, and with the wider availability of Unicode fonts, it is also increasingly used for electronic texts. It is based on a standard established by the Congress of Orientalists at Athens in 1912. The National Library at Kolkata romanization, intended for the romanization of all Indic scripts, is an extension of IAST. Harvard-Kyoto - Compared to IAST, Harvard-Kyoto looks much simpler. It does not contain all the diacritic marks that IAST contains. This makes typing in Harvard-Kyoto much easier than IAST. Harvard-Kyoto uses capital letters that can be difficult to read in the middle of words. ITRANS - ITRANS is a lossless transliteration scheme of Devanāgarī into ASCII that is widely used on Usenet. It is an extension of the Harvard-Kyoto scheme. In ITRANS, the word Devanāgarī is written as "Devanaagarii". ITRANS is associated with an application of the same name that enables typesetting in Indic scripts. The user inputs in Roman letters and the ITRANS pre-processor displays the Roman letters into Devanāgarī (or other Indic languages). The latest version of ITRANS is version 5.30 released in July, 2001. ALA-LC Romanization - ALA-LC romanization is a transliteration scheme approved by the Library of Congress and the American Library Association, and widely used in North American libraries. Transliteration tables are based on languages, so there is a table for Hindi, one for Sanskrit and Prakrit, etc - (- http://en.wikipedia.org/wiki/Devanāgarī) There are several softwares which follow these trends and provide application tools for users: like Baraha software at www.baraha.com The ‘script centricity’ approach is very clear in all these developments. The same trend is now being carried upwards in to the web scenario with W3C standards. (www.w3c.org). The current advancement in this area is Semantic web Voice XML and the like. W3Cs specification for VoiceXML 2.0 is the authorative specification for VoiceXML. For further information on the W3C Speech Interface Framework and related specifications, take a look at the W3C Voice Browser Activity. W3C Members can get access to the latest specs under development by the Voice Browser working group. Further tutorials and lots of other useful pointers can be found at the VoiceXML Forums website. The deliberations are on going about adding further sections on speech grammars and speech synthesis as well as commentaries on W3Cs work on multimodal and other topics. With this background, Java solution based software applications, devanāgarī text editors, special fonts also have com in to fore. The latest in this area is on American library of congress –special group initiative17 to explore the file names in non-roman scripts (covering indic scripts / devanāgarī script also). There is so far no good transliteration system that has evolved suitably to match the true needs of Samskrutham. There is a chart revision as on 2005 at the International Phonetic Association resource: http://internationalphoneticassociation.org/ This chart notation rests at the foundation of UNICODE and W3C modeling of semantic web and voice interfacing. This may be a good point to start and rethink on an appropriate roman character transliteration chart for Sanskrit phonemes, listed in vedic phonetics- prātiśākhya and Śikşhā works.

Page 11: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

2.8 A separate dimension of application of technologies in relation to Sanskrit studies are also good to be mentioned here: Technologies of high resolution scanning and Media based storage for preservation of Sanskrit heritage documents in Digital format, use of multimedia18 interactive technologies for Sanskrit teaching –studies. A vision of what should be the content access format of Sanskrit document in a futuristic digital library was presented 19at the Digital library conference.

2.9 As this was going on, there was another aspect of development related to Sanskrit and computers. The driving motto centered around building a programming language under the inspirational modeling of Pāṇinian grammar. The trigger was the article by Dr. Briggs, the NASA researcher. The triggering concepts being : Knowledge representation for computers using natural language grammars, building an Indian /vedic operating system, exploring the idea that <Sanskrit is the mother of all European language>, Concepts of neural net works, Artificial Intelligence et al. These studies were fascinated towards the logic of coding-decoding- communication in a Sanskrit Saṃskṛtam sentence according to the schools of Indian linguistic theories /philosophers. The interest was ‘logic needed for the linguistic expression-analysis’, rather than ‘language and computers’.

2.10 Computational linguistics is one area where Sanskrit is being closely investigated. Computational linguistics is an interdisciplinary field dealing with the statistical and/or rule-based modeling of natural language from a computational perspective. This modeling is not limited to any particular field of linguistics. Traditionally, computational linguistics was usually performed by computer scientists who had specialized in the application of computers to the processing of a natural language. Computational linguists often work as members of interdisciplinary teams, including linguists (specifically trained in linguistics), language experts (persons with some level of ability in the languages relevant to a given project), and computer scientists. In general, computational linguistics draws upon the involvement of linguists, computer scientists, experts in artificial intelligence, mathematicians, logicians, cognitive scientists, cognitive psychologists, psycholinguists, anthropologists and neuroscientists, among others. The Computational Linguistics R&D at Special Centre for Sanskrit Studies J.N.U., started since 2002 under the supervision of Dr. Girish Nath Jha. The special interest group carries on R&D in several areas of language technology for Sanskrit and other Indian languages. Current focus is on developing Sanskrit analysis tools for building Sanskrit - Hindi Translator (SaHiT). So far, some of the tools and resources have been developed like:(-http://sanskrit.jnu.ac.in/index.jsp) Lexical Resources, Language Analyzers, Sanskrit Multimedia & e-learning(RUN DIRECTLY FROM THE WEB), Sanskrit Multimedia & e-learning, Language Generators, Sandhi Generator,Subanta Generator, Tinanta generator. C-DAC, another government agency has also done significant work in relation to Sanskrit word processing and software tools, related content for public use.

2.11 While all the above work focuses on Sanskrit for computers, and keeps the voice part separate 20, there have been technology advancements which are looking for spoken natural language /voice interactivity developments. Multilingual Text reading and Multilingual OCR in relation to Sanskrit are the items for deliberation here. This investigation calls for a proper understanding of multiple aspects of Saṃskṛtam– specifically going beyond the script model of analysis. At the current moment, the language / character related interfacing standards for computing have only a focus on script; the ‘voice standard needs to be investigated in the general domain, specific

Page 12: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

domains of languages and more so in the Saṃskṛtam traditions. The integration of Script –Voice integration per True Script –Voice equivalence needs to be developed. Towards this goal, a proposal called Universal Voice code21 has been initiated and a consortium model deliberation is on the anvil.

3. ‘Saṃskṛtam for Computers’ – Unicode to Universal Voice code: This is the theme followed in the arguments seeking a new digital standards moving from UNICODE to UVC = Universal Voice code. The limitations to address the voice solutions in the current model and current complexities faced by script-unit-character / glyph rendering for shape display anchored Sanskrit for computers model is revisited. The reference, pointer lead question is this: What should be the primary voice units that should be placed in the code pages for Saṃskṛtam in the Voice interaction mode? The hypothesis is: The total set for this is provided by Pāṇini and Pratishakhyas addressing Vedas. The analysis of the current models of ‘Voice atoms’ and ‘voice molecules’ from world languages, phonetic standards listed in IPA and adapted in industry provided solutions for English like language and phoneme sets. This dimension of study will raise the bar from visual script centricity to voice anchor. It is here that the ‘Varṇa’ standard from Pāṇinian Śiva Sūtras and ‘Varṇa-Samyoga /Samhitā /Svara /Sandhi rules becomes relevant for a standard consideration. 3.1 One initial action point here can be redrafting the IPA phonetic alphabet (2005) chart from

the perspective of Saṃskṛtam Varṇamālā. 3.2 Second review point would be to take a look at the phoneme sets being used in industry

provided solutions of ‘text to speech / speech to text’ and analyze each phonemic unit as per Saṃskṛtam Varṇa –samyoga rules.

3.3 Third point may be to add one more tag field for each of the UNICODE –Language script unit with the phonemic value per Language specificity; and redraft22 the table for UNIVOICE Code.

3.4 The addition –modification of more voice units (atoms /molecules / further hybrid builds) which find a new/ modifier entry in the IPA23 also need to be taken up for analysis.

3.5 The existing rules books of Śikşhā and Prātiśākhya’ s need to be meticulously analyzed for the potential phonemic library that they point to as needed for vedic recitation. The more complex and exotic issues like speaker recognition and music transcription may not be addressable right away, there are solutions-pointers in the UVC model proposal to get at this, addressing all the fine points related to voice security!

In this way, it is proposed to build the digital ‘Voice world ‘form the ‘Voice atoms and molecules’ derived from Saṃskṛtam resources. The primary and secondary entity-entries under the Universal Voice code project have the potential to address all aspects of voice programming, natural speech systems and the like. Once the foundational human voice –blocks repertoire is available in a digital standard format, the next step is mapping speech signal units to the patterns of UVC entities. These are the steps through which IPA worked through to get a firm foothold in the digital standards world. The next step on this is building the ‘prime vocabulary base, technically called the group of ‘Pada-Prakruti’. The cluster of voice units, sequenced yield matrices of n- length dimensions. These will need to be mapped to the dictionary and meaning association, valid word-base recognition. This is technically called pratipadika- Dhatu identification. The clusters of pratays need to be technically marked.

Page 13: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

The next step on this is using the Paninian rules to construct final words which satisfy the criterion: ‘Sup-tiñgantam padam’ – the technically perfected output word from the user, which is fit to get in to the command sentence. When the cluster of such words – capturing the several shades of intended communication as ‘noun, pronoun, verb, adjective’ and the like, the word relation mapping by kāraka -vibhakti rules need to be drawn. This is akin to a prima – facie foundational building of a programming language rules and an operational Samskrutham SQL in Speech mode. This goal is setting a far higher bar for achievement to the script-visual character display centric current models of programming, data base handling and search factors. If there is one contender amongst world languages that seems to hold the promise to provide a road map to achieve this through a computational road map, it is Paninian Samskrutham. Till the basic integration suggested at UVC is achieved, and incorporated at the standards level, the plans to progress towards the true goal of human –machine interface in multilingual – multimodal mode will be a walk on the ground with the desire to reach moon. The mechanical model of manufacturing a machine by assemblage of parts may not be appropriate for human scenario; the assemblage of human body parts will not guarantee the infusion of life in to the assembly! The integrated rendering of glyphs and associating the joined sound clips/ generated sound clips with a last line <play voice > command will not satisfy the Turing Test. 4. Promoting Saṃskṛtam -Language technology research: 4.1 Reviewing what has been said so far, we have seen how Sanskrit and computers paradigm of studies has evolved till date on script centricity to the current level and the directions headed to. We have also identified what are the current areas of research and directions of research / trends. We have also noted the importance of recognizing the new paradigm of Saṃskṛtam and Computers, the potential it holds for future developments. The up-gradation of research paradigm from Sanskrit to Saṃskṛtam holds the following promises:

- Upgrading the study of Sanskrit as a historic language / classical languages on par with other languages of the world to a higher level of Saṃskṛtam guided by Pāṇinian grammar as a set of rules detailing speech activity as ‘Human Expression-Engagement Descriptor with Truth Value equivalence standard, where abstract idea is transformed to a tangible voiced expression’.

- Integrate the script centric and voice centric approach in a language appropriate way for delivering solutions of test to speech and speech to text.

- Overcome the current programming limitations in computer programming languages anchored to standards built around roman alphabet characters / script modeling of language interface / transliteration schemas linking world languages to roman alphabet characters / IPA Alphabets of non-phonetic nature. 5. Future potential – Samskrutham – Future SQL for Voice Programming Today’s trend in technology development and global markets is looking for merging multiple models of human communication-interactions. With increasing demand to have natural language based, preferentially voice mode interfacing for data entry –search related research has drawn more keen minds to investigate the human paradigms that can be analytically studied and ported to the machine domain. It is here that Saṃskṛtam paradigm holds potential. The articulations of the nature below from leading research institutions,

Page 14: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

though not using the word ‘Saṃskṛtam /India /Pāṇini’ are just elaborating the key values for which Saṃskṛtam ha stood for. This is what MIT –spoken Language Systems group places as their vision: Our goal is both simple and ambitious. Create Technology that makes it possible for everyone in the world to interact with computers via natural spoken language. Conversational interfaces will enable us to converse with machines in much the same way that we communicate with one another and will play a fundamental role in facilitating our move towards an information based society. - (http://groups.csail.mit.edu/sls/about/) - MIT Computer Science24 and Artificial Intelligence Laboratory. A different model of articulation of this can be seen in the following articulation of research titled: Spatial and Temporal Reasoning for Ambient Intelligence Systems (http://www.cosy.informatik.uni-bremen.de/events/cosit09-ami/ ) A wide-range of application domains within the fields of ambient intelligence and ubiquitous computing environments require the ability to represent and reason about dynamic spatial phenomena. Real world ambient intelligence systems that monitor and interact with an environment populated by humans and other artefacts require a formal means for representing and reasoning with spatio-temporal, event and action based phenomena that are grounded to real aspects of the environment being modelled. A fundamental requirement within such application domains is the representation of dynamic knowledge pertaining to the spatial aspects of the environment within which an agent, system or robot is functional. At a very basic level, this translates to the need to explicitly represent and reason about dynamic spatial configurations or scenes and desirably, integrated reasoning about space, actions and change. With these modelling primitives, primarily the ability to perform predictive and explanatory analyzes on the basis of available sensory data is crucial toward serving a useful intelligent function within such environments.

‘- Qualitative conceptualizations of space and tools/techniques for efficiently reasoning with them being well-established, there is now a clear felt need within the community to utilize the tools and formalisms that have been constructed in the recent years in novel application scenarios. The emerging fields of ambient intelligence and ubiquitous computing will benefit immensely from the vast body of representation and reasoning tools that have been developed in Artificial Intelligence in general, and the sub-field of Spatial and Temporal Reasoning in specific. There have already been proposals to explicitly utilize qualitative spatial calculi pertaining to different spatial domains for modeling the spatial aspect of an ambient environment (e.g., smart homes and offices) and also to utilize a formal basis for representing and reasoning about space, change and occurrences within such environments.’ – (http://www.cosy.informatik.uni-bremen.de/events/cosit09-ami/ )

As a part of this, Speaker: Dr. Dr. Norbert Streitz, Senior Scientist and Strategic Advisor, Smart Future Initiative (previously Fraunhofer-IPSI),GERMANY states in his talk as below: - Title of talk : Designing Information, Communication, and Experiences in Ubiquitous Hybrid Worlds : - "It seems like a paradox but it will soon become reality: The rate at which computers disappear will be matched by the rate at which information technology will increasingly permeate our environment and our lives". This statement by Streitz & Nixon illustrates that new challenge for designing the interaction of humans with computers embedded in everyday objects will arise. While disappearance is a major

Page 15: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

aspect, "smart" artefacts are also characterized by sensors collecting data about the environment, the devices and humans acting in this context in order to provide ambient intelligence-based support. The resulting issues are discussed based on the distinction between "system-oriented, importunate smartness", implying more or less automatic behavior of smart environments, and "people-oriented, empowering smartness", where the empowering function is in the foreground. The latter approach can be summarized as "smart spaces make people smarter" which is achieved by keeping "the human in the loop" and empowering people to be in control, making informed decisions and taking actions. Whatever type of smartness will be employed, representations of people, content and contexts play a central role. Last but not least, privacy issues in sensor-based smart environments are being discussed ranging from being a legal and moral right to becoming a commodity and privilege. The approaches and concepts will be illustrated with examples taken from different research projects ranging from smart rooms over cooperative buildings to hybrid cities.

In a second talk, titled : Spatial and Temporal Modeling for AmI Systems: Industrial Applications,Speaker:Michael Pirker, Corporate Technology SIEMENS AG, Munich, GERMANY says: The talk presents two ongoing research projects at Siemens Corporate Research and Technology in the domain of Ambient Assisted Living (AAL) and Public Surveillance. One key feature in both domains is the ability to detect and identify specific behavioral patterns of persons. While AAL applications mainly focus on providing assistance functionalty to the user, Public Surveillance applications aim at detecting (and possibly preventing) potentially dangerous situations. To this end adequate models (e.g. of human behavior or processes) have to be constructed, taking into account spatial context and temporal dependencies. These models can be evaluated using standard approaches such as DL-reasoning, whereas alternative methods (e.g. graph-based spatial reasoning, or abductive reasoning) may turn out to be more flexible and performant.

At the time of preparing this document, there has been a news around the cyber world about specific nations sponsored cyber attacks 25to infiltrate and disrupt the digital data networks of target nations. The subjects of cyber cryptography, virus guard, firewalls and secure encryption are connected issues which are of critical interest for debate here. Saṃskṛtam grammar and certain allied disciplines documented in Saṃskṛtam (ex. prahelika, samasya, gooḍha lipi) have some deliberations on the topics of linguistic encryption and security. There is a wide range of texts in sacred disciplines which use multiple layers of encryption ( tantra, rahasya) An exploration of these subjects may help develop new models and solutions to address the above threats.

During Nov 2009, Government of India has issued a notification regarding UNICODE standardization details at http://egovstandards.gov.in/index_html . In order to overcome the inherent script –character centric limitation of ‘unicode’ approach and build a ‘native language model of human- machine interface which benefits the official and other languages of India in the speech mode, Samskrutham provides the working paradigm model.

Page 16: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

6. Sanskrit and Computers – Conclusion

In essence, this is the key debate that has been documented in a technical way in several models in Saṃskṛtam in darśana, yoga and mantra śhāstra related works in specific contexts. Understanding of these technicalities need up gradation in approach from Sanskrit to Saṃskṛtam. It is here that the dialogue of traditional schools / scholars with the modern scientists needs to be done in a bigger frame than the one in which Sanskrit studies have been done so far. The Sanskrit and computer upgradation to Saṃskṛtam and Computers will pave way for bringing the relevant aspects from traditional schools for current needs of computational research. This is a task that needs willing and open participation from scholars of tradition and scientists to learn-share from each others disciplines for welfare and progress.

In short, this is an opportunity for helping all world languages empower to function as computer programming languages and develop nation-culture appropriate solutions. The shift in paradigm is clear: Sanskrit of colonial model to Samskruttam of Panini. The move is from visual script-character centric computing to integrated voice-text-language model, which can offer a better and natural human machine interfacing. The ride of english over other world languages, through a ride on technology using computing paradigm of programming languages and standards by roman alphabet characters and English like language structure can be arrested and positioned to its right place only by shifting from ‘non-phonetic, non true script, character-centric base to True phonetic, True script human standard Speech Process Communication Standards anchor. This is moving from Unicode to Universal voice code. This can be delivered uniquely by Pāṇinian grammar model for next generation programming language and human-machine interfacing standards. This is Samskrutham 26for Computers.

1 An enhanced monograph version of this article is planned by the author for a later date. The concepts presented here have been presented in different flavors by the author since 1995 in several national and international scholars. The progress in thought has evolved to the current format of Samskrutham based deliberations of voice standards, called Universal voice code. 2 Pāṇini sutra – 4-2-16, 4-4-3, 4 - 4-134 3 The word Saṃskṛtam is used in Pāṇinian grammar as a part and parcel of the aphorisms. It does not refer to any specific historic or mystic language. Prior to Pāṇini, there could have been unique names or many names for this language, which spans an undefined open ended period –looking backwards from Buddha to Valmiki and beyond to Vedic horizons. In Bharatas Natya Shastra, we do see the language nomenclatures recommended for different characters from different social strata :- ‘speaking resorting to Saṃskṛtam / following non-Saṃskṛtam model called Prakrutham’. Even in Artha shastra we have references to dialects. In the historic span of more than 3000 years, as on date 1600 dialects under around 22 main headings are

Page 17: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

traceable with Saṃskṛtam at the root. I am not touching upon the greek –latin – other South Asian – Far eastern languages end which also look at Saṃskṛtam as their ‘elder mother /elder sister’. Indian society known to Buddha spoke a refined language currently called as ‘Saṃskṛtam’ and also other derived languages called ‘Prakrits / and Saṃskṛtam influenced languages called Tamils’, which carried several regional flavors. In this assertion of Saskrutham as the name of Pāṇinian grammar regulated language, as of date, we depend upon the convention and authority of tradition. Indian society has never disputed the tag of unique common name for the language of such a long historic period and wide landscape -societal usage as ‘Saṃskṛtam’- be it Vedic resources, Ramayana, Mahabharata, Puranas, language of Science, metaphysics, science, literature, prayer and philosophies. Other associated names with Saṃskṛtam being : Brāhmī, Bharati, Bhashaa, Girvani, Devavani. Though we have an ancient expression / reference ‘Saṃskṛtam naama daivee vaak, anvaakhyaataa maharshibhih’, probably prior to Valmiki, the erudite lexicographer Amarasimha does not list ‘Saṃskṛtam’ as a synonym / similar group word for ‘Brāhmī, geervani’. The preferred word is ‘Bhaashaa’. In current parlance, this unique technicality of Saṃskṛtam –Sanskrit :: Bhaashaa –Chandas-Prakrit seem to have been seriously vitiated by the loose translation of all these technical words as ‘language’, implying with it a built in degenerative dynamics inbuilt due to historical and social dimension frames of user society, as in the case of European languages ’. 4 Here the word True is referring to the firm structure of the sentence which will not undergo any deterioration in time and space, context or speaker bias, script or voice. The ‘true’ in other sense of ‘the statement made is true and fair in content to the best of my knowledge’ is not to be brought in here. May be the translation of Satya as ‘Eternal, unchanging Truth’ gives a better resonant translation; but for introduction purpose, a brief statement is left as such. 5 ‘muni-traya’ = Pāṇini Sutra kaara, katyāyana V ārtikak āra, Pata ñ jali Bhāśhyak āra. 6 The changes in the given word from ‘na’ to ‘ṇa’ are rule regulated. Whether the ending is ‘a’, ha’ or visarga ; whether there should be a sandhi or not, how the given sentence /word unit needs to be processed in arriving at the final form, these are rule regulated. These changes are not defects or in capabilities or changes over a period of slack –slang usage! The application of phonemic change rules of natural languages derived in the following contexts to Samskrutam is a hermeneutic error: the models being: user-context -society of Indian dialects, which are derivates of already distanced prakrits / classical prakrits of India / medieval period European languages / modern society language users who keep grammar compliance to the minimal needs / lower strata of society and tribal’s who are not educated in the nuances of Samskrutam grammar to derive the firm accurate final form. 7 The split of a word like <prachodayaat> occurring in gayatri mantra in traditional school will be Pra (upasarga) + Dhatu chud with appropriate tense formation. The form is not split as < prachod> and shown. in Monier Willaim dictionary the word is shown listed as < prachod > base leading to the final form. Many such words with upasarga and verb are shown in a different way from the classical format in MW dictionary. 8 This is the key thought at the root of futuristic digital standards projection, called UNIVERSAL VOICE CODE, initiated by the present author. The idea is integration of three parameters of communication

Page 18: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

Voice (Varṇa) – Scripted character (Lipi) and Language –Grammar Code (Bhashaa) in to one single Standard, defining Human –Machine Interface on the model of Human –Human communicative expression –engagements. 9 This deliberation needs to be made for integration in the human-machine frame of thought. Enough caution needs to be exercised here not to sway to the extremes of mysticism in one end (represented by translations and over reading of technical terms of one domain inaccurately in another domain; example- Guna a technical definition in grammar is not the same as guna in Bhagavad-Gita or Āyurveda !) and emotional over projections in the other extreme (represented by the literary imaginations). 10 Technology has pushed several dimensions of social and cultural changes, in several nations. With the proliferation of roman script adoption to official work influencing every aspect of life, the national languages –native languages are getting substituted officially by roman alphabet characters. Even conservative countries like Japan, Germany, China have buckled to this pressure of English popularity riding the vehicle of technology. Romanized literacy of native-national –cultural languages has become the order of the day. Every country is fancied after creating cultural and historic records of native national languages in roman character script, how so ever imperfect the transliteration schema may be. This is a slide down detrimental way to the true phonetic heritage of several world language; and in this case to Samskrutam, especially vedic traditions.

11 The IPA is the major as well as the oldest representative organization for phoneticians. It was established in 1886 in Paris. The aim of the IPA is to promote the scientific study of phonetics and the various practical applications of that science. In furtherance of this aim, the IPA provides the academic community world-wide with a notational standard for the phonetic representation of all languages - the International Phonetic Alphabet (also IPA). The latest version of the IPA Alphabet was published in 2005. http://www.langsci.ucl.ac.uk/ipa/index.html

12 The introduction to the print publication works of Vedas by Satvalekar fro Pune carries an extensive note on what kid of symbols wereadopted and what they stand for in the text. There are works like svaramanjari which give a pictographic description of special symbol markers for special sounds. There are also hand writing ingenuities found across India manuscripts pointing to the effort for a true representation of the phonemic and accent variations in the manuscripts. 13 In 1934 when SIL was formed, linguists estimated that there were about 1,000 unwritten languages in the world. As language researchers continued their investigation, many more languages were documented. Now it is known that there are nearly 7,000 languages spoken today. The conclusions of this ongoing research have been published in an SIL reference work called the Ethnologue: Languages of the World. A new edition of this catalog of languages is published every four years. The sixteenth edition, published in 2009, lists 6,909 languages. In its 75-year history, SIL has worked with over 2,550 languages. Currently there are about 2,000 SIL language development programs in progress. The SIL Bibliography contains over 35,000 references to books, journal articles, book chapters, dissertations and other academic papers about languages and cultures authored or edited by SIL International staff or published by SIL. In addition to a body of literature in many lesser-known languages, numerous portions of Scripture have been translated.

Page 19: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

14 http://www.sil.org/sil/history.htm - Since the 70s, different committees of the Department of Electronics and the Department of Official Language have been evolving different codes and keyboards which could cater to all the Indian scripts due to their common phonetic structure. Earlier efforts could not keep the ASCII code intact. The BIS standards IS 13194:1991 conforms to the earlier standard IS 10402:1982, "8-bit coded character set for information interchange". It is intended for use in all computer & communication media which allow usage of 7 or 8 bit characters. In an 8-bit environment, the lower 128 characters are the same as defined IS 10315:1982, "7-bit coded character set for information interchange" also known as ASCII character set. The top 128 characters cater to all the ten Indian scripts based on the ancient Brāhmī script. In a 7-bit environment the control code SI can be used for invocation of ISCII code set, the control code SO can be used for reselection of the ASCII code set 15 http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=Welcome 16 http://www.sbl-site.org/aboutus.aspx - The Society of Biblical Literature is the oldest and largest international scholarly membership organization in the field of biblical studies. Founded in 1880, the Society has grown to over 8,500 international members including teachers, students, religious leaders and individuals from all walks of life who share a mutual interest in the critical investigation of the Bible. 17 http://connect.ala.org/node/88651 The issue related to: The ALCTS Non-English Access Working Group on Romanization was established by the ALCTS Non-English Access Steering Committee to implement Recommendation 10 of the report of the ALCTS Task Force on Non-English Access: 10. Examine the use of romanized data in bibliographic and authority records. Explore the following issues (including costs and benefits): (1) Alternative models (Model A and Model B) for multiscript records are specified in the MARC 21 formats. The continuing use of 880 fields (that is, Model A records) has been questioned, but some libraries may need to continue to use Model A records. What issues does using both Model A and Model B cause for LC, utilities, and vendors? (2) Requirements for access using non-Roman scripts (in general terms -- defining requirements for specific scripts falls under Recommendation 2) (3) Requirements for access using romanization 18 During 1995, the development of Multimedia Interactive CD-ROM of hagavad-Gita was brought by the present author and released for international markets. http://www.bhagavadgita.com/ 19 The author presented at the ICADL 2001 conference, the E-book which highlighted the consolidated application of several technologies pointing its usefulness for National Manuscript Mission, Oriental libraries and the like. ICADL 2001 devoted to the theme Digital libraries: dynamic landscapes for knowledge creation, access and management, was held during 10-12 December 2001 at Hotel Le Meridien, Bangalore, India. <URL: http://www.icadl2001.org>. It was organised by the University of Mysore in collaboration with Indian Institute of Information Technology, Bangalore; NISSAT, Government of India; Department of IT, Government of Karnataka; UNESCO; and Council of Scientific and Industrial Research, India. Sarada Ranganathan Endowment for Library Sciences, Bangalore; National Centre for Science Information (NCSI), Indian Institute of Science, Bangalore; Information and Library Network (INFLIBNET), Ahmedabad; Rajiv Gandhi University of Health Sciences, Bangalore,

Page 20: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

Deccan Herald, Bangalore; ISI Thomson Scientific, USA; Sun Microsystems, USA; The Hindu, Chennai; Universal Print Systems, Bangalore; and Informatics India, Bangalore also extended their valuable cooperation by way of financial help, sponsorship, etc. to organise the Conference in a grand way. The Conference with 558 delegates hailing from 18 countries turned out to be a truly memorable event as the delegates comprised among others some of the top brasses of the profession such as Prof Hsinchun Chen of the University of Arizona, Prof Edward Fox (JCDL 2001 Chair), Prof Gary Marchionini (JCDL 2002 Chair), Prof Hsueh-hua Chen (ICADL 1999 Chair), and Prof Choi (ICADL 2000 Chair). The delegates encompassing the LIS professionals, technology group, content and knowledge managers, e-publishing and aggregator group, archivists, and the service providers represented almost all sectors of the digital library community 20 More on these deliberations are presented by the author in the following paper : Paper Presented at the AVIOS Speech Developers conference and Speechtek , Spring Expo March 31-April3, 2003, The Fairmont, SanJose. ( www.avios.com , www.speechtek.com . The paper can be seen at the url http://egovstandards.gov.in/documents/egscontent.2006-12-26.4807912437 A similar ideation was presented and is published - available at the url : http://www.tug.org/TUGboat/Articles/tb24-3/sastry.pdf - Title: Enhanced Font Features for Future Multi-lingual Digital Typo-graphy with Sound-Script-Language Attribute Integration. In the year 2003. 21 The first of the workshop deliberations has taken place during Oct 2009 at Bangalore n association with CDAC. The white paper on this topic can be accessed at www.mysanskrit.com 22 This is a special exercise, governed by IP rights law that has been initiated by the author under the auspices of International Sanskrit Research Academy ® - Language Technology Project /UVC project.

23 Approval of New IPA Sound: The Labiodental Flap- http://www.langsci.ucl.ac.uk/ipa/news.html

Page 21: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

24 ABOUT SLS - What We Do -As computers increasingly permeate our daily lives, our demand for online information is skyrocketing. Growing numbers of us turn to the Internet to catch up on the latest news, sports, and weather, obtain stock quotes, reserve airline flights, conduct research, or check out what's playing at local theaters. Unfortunately, navigating through vast amounts of data to obtain useful information can require a time-consuming series of keyboard entries and mouse clicks, and technical savvy. But there is a more efficient, more flexible tool available for human-computer interaction, something that even the most technically challenged of us could use anywhere, any time: spoken language. In order to make it possible for humans to speak to computers a conversational interface is needed. A conversational interface enables humans to converse with machines (in much the same way we communicate with one another) in order to create, access, and manage information and to solve problems. It is what Hollywood and every "vision of the future" tells us that we must have. Since 1989, getting computers to communicate the way people do -- by speaking and listening -- has been the objective of the Spoken Language Systems (SLS) Group at MIT's Computer Science and Artificial Intelligence Laboratory. How does it work? - Imagine talking to a computer to find a needle-in-the haystack job listing, or showtimes of a movie premiere at the closest theater. Today, obtaining such information online requires a programmed transaction between the user, who clicks through a pre-determined sequence of options and views results, and the computer, which retrieves user-selected data. With spoken language systems, however, user and machine can engage in a spontaneous, interactive conversation, incrementally arriving at the desired information in far fewer steps. A case in point is the following excerpt from a conversation between a user and JUPITER, an SLS-based weather forecast system: JUPITER audio file -User: Yes, I would like the weather forecast for London, England, please. JUPITER: In London in England Wednesday, partly cloudy skies with periods of sunshine. High 82 and low 63. Is there something else? User: What is that in degrees Celsius, please? JUPITER: In London in England Wednesday, high 28 Celsius and low 17 Celsius. What else? SLS researchers make this kind of dialogue look easy by empowering the computer to perform five main functions in real time: speech recognition-- converting the user's speech to a text sentence of distinct words, language understanding -- breaking down the recognized sentence grammatically, and systematically representing its meaning, information retrieval -- obtaining targeted data, based on that meaning representation, from the appropriate online source, language generation -- building a text sentence that presents the retrieved data in the user's preferred language, and speech synthesis -- converting that text sentence into computer-generated speech. Throughout the conversation, the computer also remembers previous exchanges. In this example, JUPITER can respond to "What is that

Page 22: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

in degrees Celsius, please?" because the user has just asked about weather conditions in London. Otherwise, the system would request the user to clarify the question. Many speech-based interfaces can be considered conversational, and they may be differentiated by the degree with which the system maintains an active role in the conversation, or the complexity of the potential dialogue. At one extreme are system-initiative, or "directed-dialogue" transactions where the computer takes complete control of the interaction by requiring that the user answer a set of prescribed questions, much like the touch-tone implementation of interactive voice response (IVR) systems. In the case of air travel planning, for example, a directed-dialogue system could ask the user to "Please say just the departure city." Since the user's options are severely restricted, successful completion of such transactions is easier to attain, and indeed some successful demonstrations and commercial deployment of such systems have been made. At the other extreme are user-initiative systems in which the user has complete freedom in what they say to the system, (e.g., "I want to visit my grandmother") while the system remains relatively passive, asking only for clarification when necessary. In this case, the user may feel uncertain as to what capabilities exist, and may, as a consequence, stray quite far from the domain of competence of the system, leading to great frustration because nothing is understood. Lying between these two extremes are systems that incorporate a "mixed-initiative", goal-oriented dialogue, in which both the user and the computer participate actively to solve a problem interactively using a conversational paradigm. It is this latter mode of interaction that is the primary focus of our research. In 1994 has developed an conversational architecture called GALAXY that incorporates the necessary human language technologies (i.e., speech understanding and generation, discourse and dialogue) to enable advanced research in mixed-initiative interaction. Since then, the open source architecture has been adopted by many researchers around the world as a framework for conducting their research on advanced spoken dialogue systems. Here at MIT, we have developed many prototype conversational systems, many of which are deployed on toll-free telephone numbers, that enable users to access information about weather forecasts (JUPITER), airline scheduling (PEGASUS) and flight planning (MERCURY), Cambridge city locations (VOYAGER), and selected Web-based information (WebGALAXY). Raising the Level of Human to Computer Conversation Although tremendous progress has been made over the last decade in developing advanced conversational spoken language technology, much additional progress must be achieved before conversational interfaces approach the level of naturalness of human-human conversations. Today SLS researchers are refining core human language technologies and are incorporating speech with other kinds of natural input modilities such as pen and gesture. They are working to upgrade the efficiency and naturalness of application-specific conversations, improve new word detection/learning capability during speech recognition, and increase the portability of core technologies and develop new applications. As the SLS Group continues to address these issues, it brings us closer to the day when anyone, anywhere, any time, can interact easily with computers. Further Reading: V. Zue and J. Glass, "Conversational Interfaces: Advances and Challenges" Proceedings of the IEEE, Special Issue on Spoken Language Processing, Vol. 88, August 2000. (PDF) J. Glass and S. Seneff, "Flexible and Personalizable Mixed-Initiative Dialogue Systems," presented at HLT-NAACL 2003 Workshop on Research Directions in Dialogue Processing, Edmonton, Canada, May 2003. (PDF)

Page 23: Sanskrit and Computer Studies© 1mysanskrit.com/BVK1/pluginfile.php/35/mod_book/chapter/3/Sanskrit...ABSTRACT: This presentation addresses the topic Sanskrit and computer studies in

___________________________________________________________________________________ © Dr.BVK Sastry - E-mail: [email protected]

V. Zue, et al., "JUPITER: A Telephone-Based Conversational Interface for Weather Information," IEEE Transactions on Speech and Audio Processing, Vol. 8 , No. 1, January 2000.(PDF) 25 Google implicates China of cyber war - Google has just published a statement with thinly-veiled disgrace for the Chinese government. While it's not said directly -- perhaps for fear of serious retaliation -- the wording definitely implies that the Chinese government or its agencies has hacked Google's infrastructure, performed surveillance and stolen its intellectual property. Google goes on to say that the primary focus of the attack was its Gmail service. But it gets murkier: it was a targeted attack on the email accounts of Chinese human rights activists. And to add insult to injury: U.S.-, Europe- and China-based users who are advocates of human rights in China have been routinely accessed by third parties. In other words, someone (the Chinese security agency?) has phished for account details or installed backdoor/trojan malware on these advocates' computers. http://www.downloadsquad.com/2010/01/12/google-implicates-the-chinese-government-of-cyber-warfare-consi/; Hackers from China have targeted computers in the Prime Minister's Office (PMO)- http://indiatoday.intoday.in/site/Story/79215/India/Chinese+hackers+target+PMO.html The scale and sophistication of the cyber attacks on Google Inc. and other large U.S. corporations by hackers in China is raising national security concerns that the Asian superpower is escalating its industrial espionage efforts on the Internet. While the U.S. focus has been primarily on protecting military and state secrets from cyber spying, a new battle is being waged in which corporate computers and the valuable intellectual property they hold have become as much a target of foreign governments as those run by the Pentagon and the CIA. "This is a watershed moment in the cyber war," James Mulvenon, director of the Center for Intelligence Research and Analysis at Defense Group Inc., a national-security firm, said Thursday. "Before, the Chinese were going after defense targets to modernize the country's military machine. But these intrusions strike at the heart of the American innovation community." http://www.latimes.com/business/la-fi-google-china15-2010jan15,0,7251525.story 26 There are elite people like Chaturvedi ABVP Swamy from Chennai, Sri Ramanua Mission Trust, who also speaks and envisions similar ideas.