new balanced grammars using multiset, valence …

NEW BALANCED GRAMMARS USING MULTISET,

VALENCE AND TREE BASED CONTROLS

BY

SALBIAH BINTI ASHAARI

A thesis submitted in fulfilment of the requirement for the

degree of Master of Computer Science

Kulliyyah of Information and Communication Technology

International Islamic University Malaysia

MAY 2018

http://www.google.com.my/url?url=http://www.iium.edu.my/educ&rct=j&frm=1&q=&esrc=s&sa=U&ei=KHqFVJaTIZKyuATNwoGoBw&ved=0CBMQFjAA&usg=AFQjCNH8CPBB4-yr6XSF1EeEZS5f3iT02w

ii

ABSTRACT

Grammars with regulated rewriting (regulated or controlled grammars for short) have

been an active research area in the field of formal language theory. They have been

applied in a great variety of scientific disciplines ranging from linguistics through DNA

computing up to the informatics and recently come to big data analytics. However, all

of those existed grammars have not yet to be completed since each of them is either

complicated, not computationally complete or having too many unsolvable decision

problems. Thus, more variants of controlled grammars can be further investigated

through different approaches to address these issues. The main aim of this thesis is to

introduce a new variant of controlled grammars called Balanced Grammars (BG) using

four simple variables such as multiset, valence, weight and tree structure as the control

mechanisms. We have established Multiset Controlled Grammars (MCG) which are

based on terminal multisets. Another type is a modified version of tree controlled

grammars consisting three types of the grammars called Tree Multiset Controlled

Grammars (TMCG) that use multiset, Tree Valence Controlled Grammars (TVCG) that

apply valence and Tree Regularly Controlled Grammars (TRCG) that implement

regular sets. We also establish Balanced Two-steps Controlled Grammars (BTCG) that

use weight pairs. The computational power and closure properties of each type of the

grammars were studied. Based on the results, it can be proven that MCG are more

powerful than Chomsky grammars as multiset controlled regular grammars (𝑚𝑅𝐸𝐺)

can generate non-regular languages, multiset controlled linear grammars (𝑚𝐿𝐼𝑁) can

generate non-linear languages as well as multiset controlled context-free grammars

(𝑚𝐶𝐹) which can generate non-context-free grammars. In addition, a simplification of

processes for multiset controlled context-free grammars were studied and resulted in a

Chomsky normal form. Using this normal form, a membership algorithm based on

Cocke-Younger-Kasami (CYK) algorithm which can be used as a parsing was designed.

As for TMCG, TVCG, TRCG and BTCG, we demonstrated that all of these grammars

which in context-free form can generate non-context-free languages. Besides, we also

proved that all introduced grammars have at least as powerful as additive valence

grammars, and they are at most powerful as matrix grammars. Then, in term of closure

properties, most of them are closed under union, Kleene-star, homomorphism and

mirror image.

iii

خلاصة البحثABSTRACT IN ARABIC

إعادة الكتابة المنظمة )القواعد المنظمة أو المراقبة للاختصار( ومازالت مجال مع النحوية القواعدكانت لقد تم تطبيقها في اختصاصات علمية متنوعة تمتد من اللغويات بحثي نشيط في حقل نظرية اللغة الرسمية.

إلى المعلوماتية وحديثاً وصلت إلى تحليل البيانات (DNA) لحوسبة البيولوجية الجزيئيةا عبرحوسبة الدنا أوالكبيرة. ومع ذلك، كل هذه القواعد النحوية لم تكتمل بعد وذلك لأن منها ماهو معقد وغير مكتمل

منها من يحتوي على الكثير من مشاكل القرار غير القابلة للحل. وبالتالي، يمكن التحقق بشكل حسابياً و أكبر في عدد من البدائل للقواعد النحوية المنظمة عبر منهجيات مختلفة لتلبية هذه القضايا. الهدف الأساسي

عد النحوية المتوازنة من هذه الأطروحة هو تقديم بديل جديد من القواعد المنظمة والتي تدعى القواBalanced Grammars (BG) التكافؤ –وذلك باستخدام متغيرات بسيطة كالمجموعات المتعددة–

Multisetالقواعد النحوية المراقبة متعددة المجموعات قمنا بإنشاء الوزن و الشجرة كآليات للتحكم.

Controlled Grammars (MCG) طرفية. النوع الآخر هو إصدار والتي تقوم عل مجموعات متعددة( 1معدل من القواعد النحوية المراقبة الشجرية والتي تحتوي على ثلاثة أنواع من القواعد النحوية وهي:

والتي Tree Multiset Controlled Grammars (TMCG) القواعد المراقبة متعددة المجموعات الشجرية Tree Valence Controlled راقبة التكافؤية الشجرية( القواعد الم2تستخدم المجموعات المتعددة.

Grammars (TVCG) القواعد المراقبة بانتظام الشجرية3التي تطبق التكافؤ. و ) Tree Regularly

Controlled Grammars (TRCG) وقمنا أيضاً بإنشاء القواعد النحوية .عاديةوالتي تنفذ المجموعات ال والتي تستخدم Balanced Two-steps Controlled (BTCG) Grammarsالمراقبة ذي الخطوتين المتوازنة

أزواجاً من الأوزان. تم دراسة القوة الحسابية وخصائص الإغلاق لكل نوع من هذه القواعد. بالاعتماد على وبما أن القواعد العادية Chomsky أقوى من قواعد تشومسكي MCGالنتائج المستخلصة، تم إثبات أن

قادرة على توليد multiset controlled regular grammars (𝑚𝑅𝑒𝑔)المراقبة متعددة المجموعات multiset controlled linear grammars اللغات غير العادية والقواعد الخطية المراقبة متعددة المجموعات

(𝑚𝐿𝐼𝑁) قادرة على توليد اللغات غير الخطية. بالإضافة إلى ذلك فإن القواعد الخالية من السياق المراقبةقادرة على توليد قواعد multiset controlled context-free grammars (𝑚𝐶𝐹) متعددة المجموعاتالإجرائيات للقواعد خالية السياق بالإضافة إلى كل ماسبق، تم دراسة عملية تبسيط غير خالية السياق.

، تم سكي. باستخدام هذا الشكل الطبيعيالمراقبة متعددة المجموعات ونتج عنها الشكل الطبيعي لتشوموالتي يمكن Cocke-Younger-Kasami (CYK) خوارزميةخوارزمية العضوية والتي تعتمد على تصميم

iv

TMCG, TVCG, TRCG)استخدامها في الإعراب. وتم شرح وإثبات أن كل من هذه القواعد النحوية

, BTCG) ،وإلى جانب ذلك، ن أن تولد لغات غير خالية السياقوالتي لها شكل خالي السياق، يمك .قواعد التكافؤ أثبتنا أن كل القواعد النحوية المقدمة في شكل خالي السياق تحتوي على الأقل نفس قوة

المضاف وعلى الأكثر نفس قوة القواعد النحوية المصفوفة. وبالتالي من حيث خصائص الإغلاق فإن وانعكاس الصورة.التشاكل ، Kleene-star الاتحاد، معظمهم مغلق بسبب

v

APPROVAL PAGE

I certify that I have supervised and read this study and that in my opinion; it conforms

to acceptable standards of scholarly presentation and is fully adequate, in scope and

quality, as a thesis for the degree of Master of Computer Science.

……………………………………

Sherzod Turaev

Supervisor

I certify that I have read this study and that in my opinion it conforms to acceptable

standards of scholarly presentation and is fully adequate, in scope and quality, as a thesis

for the degree of Master of Computer Science.

……………………………………

Ali A. Alwan

Internal Examiner

……………………………………

Ravie Chandran Munlyandi

External Examiner

This thesis was submitted to the Department of Computer Science and is accepted as a

fulfilment of the requirement for the degree of Master of Computer Science.

……………………………………

Head, Department of Computer

Science

This thesis was submitted to the Kulliyyah of Information Communication Technology

and is accepted as a fulfilment of the requirement for the degree of Master of Computer

Science.

……………………………………

Abdul Wahab Abdul Rahman

Dean, Kulliyyah of Information

Communication Technology

vi

DECLARATION

I hereby declare that this thesis is the result of my own investigations, except where

otherwise stated. I also declare that it has not been previously or concurrently submitted

as a whole for any other degrees at IIUM or other institutions.

Salbiah binti Ashaari

Signature…………………....………. Date …….……………….

vii

COPYRIGHT

INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA

DECLARATION OF COPYRIGHT AND AFFIRMATION OF

FAIR USE OF UNPUBLISHED RESEARCH

NEW BALANCED GRAMMARS USING MULTISET, VALENCE

AND TREE BASED CONTROL

I declare that the copyright holder of this thesis are jointly owned by the student

and IIUM.

Copyright © 2018 Salbiah binti Ashaari and International Islamic University Malaysia. All rights

reserved.

No part of this unpublished research may be reproduced, stored in a retrieval system,

or transmitted, in any form or by any means, electronic, mechanical, photocopying,

recording or otherwise without prior written permission of the copyright holder except

as provided below

1. Any material contained in or derived from this unpublished research may

be used by others in their writing with due acknowledgement.

2. IIUM or its library will have the right to make and transmit copies (print

or electronic) for institutional and academic purposes.

3. The IIUM library will have the right to make, store in a retrieved system

and supply copies of this unpublished research if requested by other

universities and research libraries.

By signing this form, I acknowledged that I have read and understand the IIUM

Intellectual Property Right and Commercialization policy.

Affirmed by Salbiah binti Ashaari

……..…………………….. ………………………..

Signature Date cation

viii

ACKNOWLEDGEMENTS

Bismillahirrahmanirrahim.

All praises to Allah, the most Almighty and the most Gracious, on whom

ultimately we rely on for guidance and sustenance. May Peace and Blessings of Allah

be upon His Prophet Muhammad S.A.W.

This thesis materializes in its current shape due to the guidance, incentive and

assistance of numerous people. Therefore, here I would like to overture my

wholehearted thanks to all of them who had contributed directly and indirectly in

helping me to finish my journey as a master student triumphantly.

I would like to convey a deep sense of gratitude to my parents and family for

always believing in and giving me an opportunity to pursue my dreams. Your

unconditional love, material and spiritual support, encouragement and prayer have been

a great help to me to withstand any difficulties in my life. Heartfelt thanks also go to all

my friends and colleagues who have made my master journey extremely enjoyable,

memorable, and fruitful

Furthermore, I express my deepest and sincere gratitude to my supervisor, Dr.

Sherzod Turaev for introducing me into this field of study and for accepting me as his

Master student with inexhaustible patience, understanding and wisdom. I really

appreciate the insightful suggestions, numerous of good ideas and research freedom that

I have enjoyed under his supervision. Not only that, I am also heartily grateful to him

for giving me a chance to have a financial support under his procured grant FRGS13-

0066-0307 from Ministry of Education, Malaysia during my first semester of study.

This work would not be completed without his intelligence, astute criticism with

thoughtful guidance, constant encouragement, and financial support. Thank you for

your mentorship and guidance throughout all these years.

Then, my warmest gratitude goes to Dr. Abdurahim Okhunov who also has given

me an opportunity to get a financial support from his obtained grant FRGS13-074-0315

from Ministry of Education, Malaysia during my second semester of study and onwards.

Besides, I cordially appreciate the examiners for my proposal defense, seminar and viva

Dr. M.M Hafizur Rahman, Dr. Amelia Ritahani, Dr. Mohd Izzuddin Mohd Tamrin, Dr.

Sofianiza Abdul Malik, Dr. Ali A. Alwan, Ravie Chandran Munlyandi and Dr. Abd

Rahman Ahlan for their willingness to spend some of their valuable time to review and

evaluate my thesis as well as for giving some critical and valuable comments and

suggestions to improve my work. Not forgotten, my sincerest thanks go to all members

of the Kulliyyah of Information and Communication Technology, International Islamic

University Malaysia for their kindness and open-handed hospitality to me during my

study there.

Thank you all for everything. May all of your helps will get goodness from the

Almighty Allah. Wa’salam.

ix

TABLE OF CONTENTS

Abstract ...................................................................................................................... ii Abstract In Arabic ...................................................................................................... iii

Approval Page ............................................................................................................ v Declaration ................................................................................................................. vi Copyright ................................................................................................................... vii Acknowledgements .................................................................................................... viii List Of Tables ............................................................................................................ xi

List Of Figures ........................................................................................................... xii List of Abbreviations ................................................................................................. xiii List of Symbols .......................................................................................................... xiv

CHAPTER ONE: INTRODUCTION .................................................................... 1 1.1 Background of the Study ......................................................................... 1 1.2 Statement of the Problem ........................................................................ 4

1.3 Research Objectives ................................................................................ 6 1.4 Research Questions ................................................................................. 7

1.5 Significance of the Study ........................................................................ 7 1.6 Scope of the Study .................................................................................. 7

1.7 Research Methodology............................................................................ 8

1.8 Organisation of the Thesis ...................................................................... 9

CHAPTER TWO: LITERATURE REVIEW ....................................................... 11

2.1 Introduction ............................................................................................. 11 2.2 Formal Languages and Chomsky grammars ........................................... 11 2.3 Grammars with regulated Rewriting ....................................................... 14

2.3.1 Multiset Grammars ........................................................................ 16 2.3.2 Tree Controlled Grammars ............................................................ 21

2.3.3 Valence Grammars ........................................................................ 24 2.4 Summary ................................................................................................. 27

CHAPTER THREE: PRELIMINARIES .............................................................. 29

3.1 Introduction ............................................................................................. 29 3.2 General notations .................................................................................... 29

3.3 Alphabet, String, Language and multiset ................................................ 29 3.4 Operations on Languages ........................................................................ 30 3.5 Grammars and Chomsky Hierarchy ........................................................ 31 3.6 Derivation Tree ....................................................................................... 33 3.7 Grammars with Regulated Rewriting...................................................... 34

CHAPTER FOUR: MULTISET CONTROLLED GRAMMARS ..................... 36 4.1 Introduction ............................................................................................. 36 4.2 Definitions and Examples ....................................................................... 36 4.3 Generative Powers .................................................................................. 41

4.4 Normal Form ........................................................................................... 52 4.5 Closure Properties ................................................................................... 56

x

4.6 Application: parsing ................................................................................ 59 4.6.1 Parsing Based on an Extended CYK Algorithm ............................ 60

4.7 Summary ................................................................................................. 71

CHAPTER FIVE: OTHER BALANCED GRAMMARS .................................... 73 5.1 Introduction ............................................................................................. 73 5.2 Tree Multiset Controlled Grammars ....................................................... 73

5.2.1 Definitions and Example ................................................................ 73 5.2.2 Generative Power ........................................................................... 77 5.2.3 Closure Properties .......................................................................... 90

5.3 Tree Valence COntrolled Grammars ...................................................... 92 5.3.1 Definitions and Example ................................................................ 92

5.3.2 Generative Power ........................................................................... 98 5.3.3 Closure Properties .......................................................................... 112

5.4 Tree Regularly Controlled Grammars ..................................................... 114 5.4.1 Definitions and Example ................................................................ 114 5.4.2 Generative Power ........................................................................... 116 5.4.3 Closure Properties .......................................................................... 127

5.5 Balanced Two-Steps Controlled Grammars............................................ 128 5.5.1 Definitions and Example ................................................................ 128

5.5.2 Generative Power ........................................................................... 131 5.5.3 Closure Properties .......................................................................... 139

5.6 Summary ................................................................................................. 140

CHAPTER SIX: CONCLUSION AND FUTURE WORK .................................. 142

REFERENCES ......................................................................................................... 145

LIST OF PUBLICATIONS .................................................................................... 151

xi

LIST OF TABLES

Table 3.1 Closure Properties Owned by Chomsky Grammars 33

xii

LIST OF FIGURES

Figure 2.1 Set Inclusion of Grammars Described by Chomsky Hierarchy 13

Figure 4.1 CYK Table 64

Figure 4.2 Final CYK Result Table 70

Figure 4.3 Final Counter Multiset Table 70

Figure 4.4 The Derivation Tree for Example 4.6.1 71

Figure 4.5 The Hierarchy of Families of Language Generated by Multiset

Controlled Grammar 72

Figure 5.1 Example of a Tree Derivation for a string a3b3c3 (TM) 76

Figure 5.2 Example of a Tree Derivation for a string a3b3c3 (TV) 97

Figure 5.3 Example of a Tree Derivation for a string a8 116

xiii

LIST OF ABBREVIATIONS

CYK Cocke-Younger-Kasami

RE Recursive enumerable languages

CS Context-sensitive languages

CF Context-free languages

REG Regular languages

LIN Linear languages

MAT The family of languages of matrix grammars (without erasing rules)

MATλ The family of languages of matrix grammars (with erasing rules)

TC The family of languages of tree controlled grammars (without erasing

rules)

TCλ The family of languages of tree controlled grammars (with erasing

rules)

aVAL The family of languages of additive valence grammars (without erasing

rules)

aVALλ The family of languages of additive valence grammars (with erasing

rules)

mREG Multiset controlled regular languages (without erasing rules)

mREGλ Multiset controlled regular languages (with erasing rules)

mLIN Multiset controlled linear languages (without erasing rules)

mLINλ Multiset controlled linear languages (with erasing rules)

mCF Multiset controlled context-free languages (without erasing rules)

mCFλ Multiset controlled context-free languages (with erasing rules)

TM The family of languages of tree multiset controlled grammars (without

erasing rules)

TMλ The family of languages of tree multiset controlled grammars (with

erasing rules)

TV The family of languages of tree valence controlled grammars (without

erasing rules)

TVλ The family of languages of tree valence controlled grammars (with

erasing rules)

TR The family of languages of tree regularly controlled grammars (without

erasing rules)

TRλ The family of languages of tree regularly controlled grammars (with

erasing rules)

TS The family of languages of balanced two-steps controlled grammars

(without erasing rules)

TSλ The family of languages of balanced two-steps controlled grammars

(with erasing rules)

xiv

LIST OF SYMBOLS

∈ Membership of an element to a set

∉ Negation of ∈

⊆ Inclusion

⊂ Proper inclusion

∪ Union

∩ Intersection

× Cross product

− Difference

|A| Cardinality of set A

2A Power set of A

∅ Empty set

{A} Element in a set

Z Integer number

N Natural number

R Real number

Q Rational number

Σ Alphabet

Σ∗ The set of all finite strings over V

Σ+ The set of all non-empty finite strings over V

λ Empty string

|w| Length of string w

wR Mirror image of string w

μ Multiset

μ(a) Number of occurrences of a in μ

A⊕ Set of all multisets over set A

G Grammar

S Start symbol

P Set of production rules

N Set of nonterminal symbols

T Set of terminal symbols

1

CHAPTER ONE

INTRODUCTION

1.1 BACKGROUND OF THE STUDY

Soon after the emergence of modern computers, people have realized the fact that all

forms and types of data (information) such as names, numbers, pictures, sounds, videos,

waves and so on can be regarded as structures of symbols, (i.e., strings or words) in

which the collection of those strings is called language as well as is the stem of formal

language theory, a branch of knowledge that studies formal grammars and languages

(Jiang et al., 2010). Formally, formal language theory is defined as the study of sets of

abstract words over a finite alphabet of symbols where arguably can categorize the

language representative models into two fundamental groups, which are generative and

recognition language models. Generative models, which also well known as grammars,

are used for words generation while recognition models, which preferable called

automata, are used for words acceptance. In this thesis, we focus only on the formalism

of grammars where particularly on the grammars with regulated rewriting.

A grammar can be simply defined as a set of formation productions (aka rules)

for words in a formal language where such productions depict the valid ways for words

to be formed from the language’s alphabet correspond to the language’s syntax

(Chomsky, 1956). In other words, a grammar consists of a set of finite rules over a set

of finite variables and characters in which generating a language is depending on the

rules be applied. Thus, it can be seen as a formal statement portraying the language

structure or motif to be described. In general, grammars are used and to generate and

analyze the strings of a language. The significances of using grammars comprise the

fact that they give a formal definition of the syntax of a language in which it enables

2

reasoning with regard to language elements, they can construct the kernel for a parsing

algorithm as well as they can be used as a tool for syntax specification (Jiang et al.,

2010). In term of application, grammars are also found to be profoundly useful in

computing alignments and solving the approximate string matching problems such as

plagiarism, mirror pages and biomarker (Siederdissen, Hofacker & Stadler, 2015).

Besides, it also can be employed in comparing sequences of nucleic acids (DNA or

RNA) or of amino acids (proteins) chains for diagnosing certain diseases or verification

issues (Chiang, 2012).

Basically, grammars can be classified into two fundamental categories which are

context-free grammars and non-context-free grammars. Among those two, context-free

grammars are the most developed and well examined grammar class in Chomsky

hierarchy due to their beauty in term of simplicity and intuitively captivating formalism.

In fact, context-free grammars have been widely used in many applications where

generally are in the compilation and specification of languages of programming (Bel-

Enguix, Jimenez-Lopez & Martin-Vide, 2008; Martin-Vide, Mitrana & Paun, 2004). To

be more specific, they have been used for an authentication protocol which use one-

time authentication information to generate one time passwords (Singh, Dagon & Dos

Santos, 2004), for generating equation structures that can predict the peak ground

acceleration of an event of earthquake by describing the dependencies of a given set of

data (Bosman & Gruner, 2013) and for enhancing the log file analysis by delineating

the intrusion patterns and acceptable log files (Markic & Stankovski, 2013).

Generally, a context-free grammar is a set of rules that grant one to substitute a

variable by a string of terminals and variables where each string in the language own a

derivation tree with leftmost derivation. They have a broad applicability and at the same

time they have a lot of good sides in terms of computational properties and complexity

3

problems (Bel-Enguix, Jimenez-Lopez & Martin-Vide, 2008; Sipser, 2013). However,

it is well known that the world is not totally “context-free” where there are many

circumstances that caused the appearing of non-context-free languages that have the

basic features like reduplication (e.g: {𝑤𝑤|𝑤 ∈ 𝑇∗}) , multiple agreement (e.g:

{𝑎𝑛𝑏𝑛𝑐𝑛𝑑𝑛|𝑛 ≥ 1}) and crossed agreements (e.g: {𝑎𝑛𝑏𝑚𝑐𝑛𝑑𝑚|𝑛,𝑚 ≥ 1}) (Dassow

& Paun, 1989; Dassow, Paun & Salomaa, 1997).

Thus, we need to go beyond context-free grammars where one of the solutions

is to consider the context-sensitive grammars which are more powerful (Dassow &

Paun, 1989). Nevertheless, in spite of their great power, they have some serious

problems in the practical usage, where they have several adverse features regarding

decidability problems in which whether they are undecidable or having exponential time

algorithms (Dassow & Paun, 1989). In context-sensitive grammars, the emptiness and

finiteness problems are undecidable. Furthermore, it is hard or impossible to describe

the derivations of context-sensitive grammars by a graph or tree structure which is an

essential tool in analyzing the structure of the problems (Dassow & Paun, 1989). These

are the reasons why many researchers are looking for intermediate grammars between

context-free and context-sensitive grammars, called grammars with regulated rewriting

(aka regulated or controlled grammars), where they can combine the beauty and

simplicity of context-free, at the same time possess the power of context-sensitive

grammars.

A regulated grammar is portrayed as a grammar with some additional

mechanisms where the applications of certain rules are being restricted in order to avoid

certain derivations process where it signifies that the strings set generated by a grammar

with regulated rewriting is a subset of the strings set generated by the same grammar

without regulated rewriting (Dassow & Paun, 1989; Meduna & Soukup, 2017). The

4

core notion behind regulated rewriting mechanism is to choose a simple model with

high computational power. There is a broad variety of interesting regulated grammars

which each of them use different modes of operation to regulate restrictions. Here are

some examples, in matrix grammars, one need to use only particular previously

specified of rules (Abraham, 1965), in regularly control grammars, the string of rules

correlating to a derivation must be the property of a set of strings formerly specified

(Ginsburg & Spanier, 1968), in ordered grammars, certain rules cannot be used if some

others are still related to each other (Fris, 1968), in programmed grammars, the choice

of one rule will determine the next applicable of rules (Rsenkrantz, 1969) and many

more.

1.2 STATEMENT OF THE PROBLEM

Regulated grammar is one of various types of grammars introduced throughout the

history of formal language theory with a main goal to extend the power of grammars

without significantly increasing any complexity of their model. It is one of the most

competent practices to augment the hierarchy of Chomsky with a huge variation of

classes of languages that place within it (Meduna and Zemek, 2014). The common

practice for implementing a regulated grammar is to combine two simpler models which

are a grammar, use for generating a sentence by its productions and an additional control

model, use for restricting the derivations process.

In monographs written by Meduna and Soukup (2017), Meduna and Zemek

(2014) and Dassow and Paun (1989), we can find a large number of various types of

regulated grammar that preserve the nature of context-free such as tree controlled

grammars, matrix grammars, valence grammars, programmed grammars, probabilistic

grammars, state grammars, random-context grammars, ordered grammars and many

5

more. All of these grammars have achieved plentiful remarkable results within formal

language theory and are different from each other, depending on their restrictions either

based upon the variety of context related or on the use of rules during the process of

generating the languages. However, under certain circumstances, they are too

complicated or not computationally complete or correlate to a group of grammars with

too many unsolvable decision problems which have lessen the practical interest

(Dassow & Paun, 1989). Moreover, this issue has been investigated for a long period of

time and there is still no definite method which proven to be the best in solving it. In

addition, the swift growth in present day technology, industry and other fields have

resulted more and more new and intricate issues arise which require to seek for new

suitable tools to counter those issues.

Although some studies on grammars have been done; multiset grammars apply

the rules as multiset in restricting the use of grammars` productions (Kudlek, Martin &

Paun, 2001); tree controlled grammars impose restrictions upon the derivation trees of

grammars using regular language (Culik & Maurer, 1977) and valence grammars which

assigning each production with an integer from a given monoid (Paun, 1980), there are

still some captivating topics in this direction to look for future study. For instance, there

is no research done in using multiset on terminal symbols which can be based on an

operation namely “counter” where in every production in the grammar, a multiset value

will be given to it depending on the number of terminal alphabet existed in the right-

hand side of that production as a control mechanism. Other than that, how powerful the

grammars can be if we combine the valence or multiset with tree basic structure to

control the derivation of grammar like in tree controlled grammars as well as if we

implement regular sets of productions of the grammars rather than checking it are also

yet to be known.

6

Balanced grammars have been structurally introduced by Berstel and Boasson

(2002) with the intent to overcome the ambiguous issue. They were about Dyck

languages where they generate words over a parenthesis set that are well-formed to be

described (bracketed structure). The right-hand side of their productions for each

nonterminal is a regular set. In short, they are a generalization of grammars of

parentheses in two directions. Then, a balanced language is characterized through a

syntactic congruence property (Berstel & Boasson, 2002; Brabrand, Giegerich &

Moller, 2010). However, there is no study done from an arithmetical aspect for that

topic.

1.3 RESEARCH OBJECTIVES

This research aims to accomplish the following objectives:

1. To define different variants of balanced grammars such as multiset

controlled grammars, balanced two-steps controlled grammars, tree

multiset controlled grammars, tree valence controlled grammars and tree

regularly controlled grammars.

2. To study the computational power of balanced grammars.

3. To investigate the closure properties of balanced grammars.

A formal definition is precise where it states the essential and adequate

conditions for a thing to be a member of a particular set, resolving any uncertainties

matters as well as differencing certain term from any other term. Indeed, the power of

regulated models provides information regarding the family of languages defined by the

models while knowing their properties determine whether several types of simple

languages can be merged in a complex language and vice versa.

7

1.4 RESEARCH QUESTIONS

In order to elaborate the objectives of the research, we set the following research

questions:

1. What are the balance features of the strings generated by context-free

grammars?

2. What types of balanced grammars can be defined?

3. What are the computational capacities of balanced grammars?

4. What kinds of closure properties do balanced grammars possess?

1.5 SIGNIFICANCE OF THE STUDY

Regulated grammars which efficiently implement parsing and other generative

operations are theoretical frameworks for data science based computing tools and

algorithms. They can be used as a base for information processing technologies. In

addition, the result of this research is appeared in the form of new theorems in which

will extent the corpus of knowledge concerning theory of formal languages.

1.6 SCOPE OF THE STUDY

The material regarding regulated grammars is so enormous in which it is indisputably

nonviable to cover it completely in one study. In this thesis, we focus on introducing a

new model of regulated grammar called balanced grammars using five new modes of

operation. We give a formal definition with example and restrict our attention to their

generative powers and properties. The focus does not lie in demonstrating the real world

application of introduced grammars as it is too wide for the scope of this thesis.

8

1.7 RESEARCH METHODOLOGY

This research applied a constructive theoretical approach where it intended to provide

new theories based on mathematical and formal methods. Therefore, the results are

mainly appeared in the form of mathematical statements such propositions, lemmas and

theorems. The research is conducted in five phases as follows:

Phase 1. Literature review

We conducted a systematic literature review together with the comparative analysis of

the previous related studies on formal language theory, Chomsky grammars and

grammars with regulated rewriting concentrating on multiset grammars, tree controlled

grammars and valence grammars which will help in introducing and studying several

new variants of balanced grammars.

Phase 2. Preliminaries

We provided the necessary basic notations, terminologies and definitions related to

formal languages theory, multiset and a derivation tree which will be used to produce

results established throughout the study.

Phase 3. Introduction of balanced controlled grammars

We defined five different concepts of balanced grammars by adapting multiset, valence,

weight and tree structure as the control mechanisms in the grammar counterparts. In

addition, we constructed a few examples in comparative approach to explore their

nature.

9

Phase 4. Study of computational power

We studied the computational power of balanced grammars with comparison to general

Chomsky grammars as well as with other well-known controlled grammars such tree

controlled grammars, matrix grammars and valence grammars.

Phase 5. Investigation of closure properties

We examined the closure properties such as union, kleene star, complement,

concatenation, substitution, mirror image, homomorphisms, permutation and so forth

that can be possessed by balanced grammars using the available techniques and methods

applied in proving closure properties of Chomsky languages and grammars.

1.8 ORGANISATION OF THE THESIS

This section gives a rough idea of the rest of the thesis structure where its purpose is to

portray the thesis flow as well as the main content of thesis chapters and how they are

connected to each other. Chapter 2 provides a relevant overview of previous related

works on formal languages theory and grammars with regulated rewriting especially

concerning multiset grammars, tree controlled grammars and valence grammars. Then,

Chapter 3 recalls some well-known basic notations, terminologies, facts, concepts and

results related to the formal languages theory, derivation tree, grammar with and

without regulated rewriting as well as operations on languages which will be used in

hereinafter investigations.

Afterwards, Chapter 4 introduces a new variant of balanced grammars known as

multiset controlled grammars and studies its computational powers, normal form,

closure properties and applications. Next, Chapter 5 introduces four new variants of

balanced grammars called tree multiset controlled grammars, tree valence controlled

10

grammars and tree regularly controlled grammars balanced two-steps controlled

grammars and investigates their computational powers as well as their closure

properties. Lastly, Chapter 6 gives a summarization of all materials discussed in

previous chapters as well as with possible future research discipline raised in this thesis.

new balanced grammars using multiset, valence …

Documents