new balanced grammars using multiset, valence …
TRANSCRIPT
NEW BALANCED GRAMMARS USING MULTISET,
VALENCE AND TREE BASED CONTROLS
BY
SALBIAH BINTI ASHAARI
A thesis submitted in fulfilment of the requirement for the
degree of Master of Computer Science
Kulliyyah of Information and Communication Technology
International Islamic University Malaysia
MAY 2018
ii
ABSTRACT
Grammars with regulated rewriting (regulated or controlled grammars for short) have
been an active research area in the field of formal language theory. They have been
applied in a great variety of scientific disciplines ranging from linguistics through DNA
computing up to the informatics and recently come to big data analytics. However, all
of those existed grammars have not yet to be completed since each of them is either
complicated, not computationally complete or having too many unsolvable decision
problems. Thus, more variants of controlled grammars can be further investigated
through different approaches to address these issues. The main aim of this thesis is to
introduce a new variant of controlled grammars called Balanced Grammars (BG) using
four simple variables such as multiset, valence, weight and tree structure as the control
mechanisms. We have established Multiset Controlled Grammars (MCG) which are
based on terminal multisets. Another type is a modified version of tree controlled
grammars consisting three types of the grammars called Tree Multiset Controlled
Grammars (TMCG) that use multiset, Tree Valence Controlled Grammars (TVCG) that
apply valence and Tree Regularly Controlled Grammars (TRCG) that implement
regular sets. We also establish Balanced Two-steps Controlled Grammars (BTCG) that
use weight pairs. The computational power and closure properties of each type of the
grammars were studied. Based on the results, it can be proven that MCG are more
powerful than Chomsky grammars as multiset controlled regular grammars (𝑚𝑅𝐸𝐺)
can generate non-regular languages, multiset controlled linear grammars (𝑚𝐿𝐼𝑁) can
generate non-linear languages as well as multiset controlled context-free grammars
(𝑚𝐶𝐹) which can generate non-context-free grammars. In addition, a simplification of
processes for multiset controlled context-free grammars were studied and resulted in a
Chomsky normal form. Using this normal form, a membership algorithm based on
Cocke-Younger-Kasami (CYK) algorithm which can be used as a parsing was designed.
As for TMCG, TVCG, TRCG and BTCG, we demonstrated that all of these grammars
which in context-free form can generate non-context-free languages. Besides, we also
proved that all introduced grammars have at least as powerful as additive valence
grammars, and they are at most powerful as matrix grammars. Then, in term of closure
properties, most of them are closed under union, Kleene-star, homomorphism and
mirror image.
iii
خلاصة البحثABSTRACT IN ARABIC
إعادة الكتابة المنظمة )القواعد المنظمة أو المراقبة للاختصار( ومازالت مجال مع النحوية القواعدكانت لقد تم تطبيقها في اختصاصات علمية متنوعة تمتد من اللغويات بحثي نشيط في حقل نظرية اللغة الرسمية.
إلى المعلوماتية وحديثاً وصلت إلى تحليل البيانات (DNA) لحوسبة البيولوجية الجزيئيةا عبرحوسبة الدنا أوالكبيرة. ومع ذلك، كل هذه القواعد النحوية لم تكتمل بعد وذلك لأن منها ماهو معقد وغير مكتمل
منها من يحتوي على الكثير من مشاكل القرار غير القابلة للحل. وبالتالي، يمكن التحقق بشكل حسابياً و أكبر في عدد من البدائل للقواعد النحوية المنظمة عبر منهجيات مختلفة لتلبية هذه القضايا. الهدف الأساسي
عد النحوية المتوازنة من هذه الأطروحة هو تقديم بديل جديد من القواعد المنظمة والتي تدعى القواBalanced Grammars (BG) التكافؤ –وذلك باستخدام متغيرات بسيطة كالمجموعات المتعددة–
Multisetالقواعد النحوية المراقبة متعددة المجموعات قمنا بإنشاء الوزن و الشجرة كآليات للتحكم.
Controlled Grammars (MCG) طرفية. النوع الآخر هو إصدار والتي تقوم عل مجموعات متعددة( 1معدل من القواعد النحوية المراقبة الشجرية والتي تحتوي على ثلاثة أنواع من القواعد النحوية وهي:
والتي Tree Multiset Controlled Grammars (TMCG) القواعد المراقبة متعددة المجموعات الشجرية Tree Valence Controlled راقبة التكافؤية الشجرية( القواعد الم2تستخدم المجموعات المتعددة.
Grammars (TVCG) القواعد المراقبة بانتظام الشجرية3التي تطبق التكافؤ. و ) Tree Regularly
Controlled Grammars (TRCG) وقمنا أيضاً بإنشاء القواعد النحوية .عاديةوالتي تنفذ المجموعات ال والتي تستخدم Balanced Two-steps Controlled (BTCG) Grammarsالمراقبة ذي الخطوتين المتوازنة
أزواجاً من الأوزان. تم دراسة القوة الحسابية وخصائص الإغلاق لكل نوع من هذه القواعد. بالاعتماد على وبما أن القواعد العادية Chomsky أقوى من قواعد تشومسكي MCGالنتائج المستخلصة، تم إثبات أن
قادرة على توليد multiset controlled regular grammars (𝑚𝑅𝑒𝑔)المراقبة متعددة المجموعات multiset controlled linear grammars اللغات غير العادية والقواعد الخطية المراقبة متعددة المجموعات
(𝑚𝐿𝐼𝑁) قادرة على توليد اللغات غير الخطية. بالإضافة إلى ذلك فإن القواعد الخالية من السياق المراقبةقادرة على توليد قواعد multiset controlled context-free grammars (𝑚𝐶𝐹) متعددة المجموعاتالإجرائيات للقواعد خالية السياق بالإضافة إلى كل ماسبق، تم دراسة عملية تبسيط غير خالية السياق.
، تم سكي. باستخدام هذا الشكل الطبيعيالمراقبة متعددة المجموعات ونتج عنها الشكل الطبيعي لتشوموالتي يمكن Cocke-Younger-Kasami (CYK) خوارزميةخوارزمية العضوية والتي تعتمد على تصميم
iv
TMCG, TVCG, TRCG)استخدامها في الإعراب. وتم شرح وإثبات أن كل من هذه القواعد النحوية
, BTCG) ،وإلى جانب ذلك، ن أن تولد لغات غير خالية السياقوالتي لها شكل خالي السياق، يمك .قواعد التكافؤ أثبتنا أن كل القواعد النحوية المقدمة في شكل خالي السياق تحتوي على الأقل نفس قوة
المضاف وعلى الأكثر نفس قوة القواعد النحوية المصفوفة. وبالتالي من حيث خصائص الإغلاق فإن وانعكاس الصورة.التشاكل ، Kleene-star الاتحاد، معظمهم مغلق بسبب
v
APPROVAL PAGE
I certify that I have supervised and read this study and that in my opinion; it conforms
to acceptable standards of scholarly presentation and is fully adequate, in scope and
quality, as a thesis for the degree of Master of Computer Science.
……………………………………
Sherzod Turaev
Supervisor
I certify that I have read this study and that in my opinion it conforms to acceptable
standards of scholarly presentation and is fully adequate, in scope and quality, as a thesis
for the degree of Master of Computer Science.
……………………………………
Ali A. Alwan
Internal Examiner
……………………………………
Ravie Chandran Munlyandi
External Examiner
This thesis was submitted to the Department of Computer Science and is accepted as a
fulfilment of the requirement for the degree of Master of Computer Science.
……………………………………
Head, Department of Computer
Science
This thesis was submitted to the Kulliyyah of Information Communication Technology
and is accepted as a fulfilment of the requirement for the degree of Master of Computer
Science.
……………………………………
Abdul Wahab Abdul Rahman
Dean, Kulliyyah of Information
Communication Technology
vi
DECLARATION
I hereby declare that this thesis is the result of my own investigations, except where
otherwise stated. I also declare that it has not been previously or concurrently submitted
as a whole for any other degrees at IIUM or other institutions.
Salbiah binti Ashaari
Signature…………………....………. Date …….……………….
vii
COPYRIGHT
INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA
DECLARATION OF COPYRIGHT AND AFFIRMATION OF
FAIR USE OF UNPUBLISHED RESEARCH
NEW BALANCED GRAMMARS USING MULTISET, VALENCE
AND TREE BASED CONTROL
I declare that the copyright holder of this thesis are jointly owned by the student
and IIUM.
Copyright © 2018 Salbiah binti Ashaari and International Islamic University Malaysia. All rights
reserved.
No part of this unpublished research may be reproduced, stored in a retrieval system,
or transmitted, in any form or by any means, electronic, mechanical, photocopying,
recording or otherwise without prior written permission of the copyright holder except
as provided below
1. Any material contained in or derived from this unpublished research may
be used by others in their writing with due acknowledgement.
2. IIUM or its library will have the right to make and transmit copies (print
or electronic) for institutional and academic purposes.
3. The IIUM library will have the right to make, store in a retrieved system
and supply copies of this unpublished research if requested by other
universities and research libraries.
By signing this form, I acknowledged that I have read and understand the IIUM
Intellectual Property Right and Commercialization policy.
Affirmed by Salbiah binti Ashaari
……..…………………….. ………………………..
Signature Date cation
viii
ACKNOWLEDGEMENTS
Bismillahirrahmanirrahim.
All praises to Allah, the most Almighty and the most Gracious, on whom
ultimately we rely on for guidance and sustenance. May Peace and Blessings of Allah
be upon His Prophet Muhammad S.A.W.
This thesis materializes in its current shape due to the guidance, incentive and
assistance of numerous people. Therefore, here I would like to overture my
wholehearted thanks to all of them who had contributed directly and indirectly in
helping me to finish my journey as a master student triumphantly.
I would like to convey a deep sense of gratitude to my parents and family for
always believing in and giving me an opportunity to pursue my dreams. Your
unconditional love, material and spiritual support, encouragement and prayer have been
a great help to me to withstand any difficulties in my life. Heartfelt thanks also go to all
my friends and colleagues who have made my master journey extremely enjoyable,
memorable, and fruitful
Furthermore, I express my deepest and sincere gratitude to my supervisor, Dr.
Sherzod Turaev for introducing me into this field of study and for accepting me as his
Master student with inexhaustible patience, understanding and wisdom. I really
appreciate the insightful suggestions, numerous of good ideas and research freedom that
I have enjoyed under his supervision. Not only that, I am also heartily grateful to him
for giving me a chance to have a financial support under his procured grant FRGS13-
0066-0307 from Ministry of Education, Malaysia during my first semester of study.
This work would not be completed without his intelligence, astute criticism with
thoughtful guidance, constant encouragement, and financial support. Thank you for
your mentorship and guidance throughout all these years.
Then, my warmest gratitude goes to Dr. Abdurahim Okhunov who also has given
me an opportunity to get a financial support from his obtained grant FRGS13-074-0315
from Ministry of Education, Malaysia during my second semester of study and onwards.
Besides, I cordially appreciate the examiners for my proposal defense, seminar and viva
Dr. M.M Hafizur Rahman, Dr. Amelia Ritahani, Dr. Mohd Izzuddin Mohd Tamrin, Dr.
Sofianiza Abdul Malik, Dr. Ali A. Alwan, Ravie Chandran Munlyandi and Dr. Abd
Rahman Ahlan for their willingness to spend some of their valuable time to review and
evaluate my thesis as well as for giving some critical and valuable comments and
suggestions to improve my work. Not forgotten, my sincerest thanks go to all members
of the Kulliyyah of Information and Communication Technology, International Islamic
University Malaysia for their kindness and open-handed hospitality to me during my
study there.
Thank you all for everything. May all of your helps will get goodness from the
Almighty Allah. Wa’salam.
ix
TABLE OF CONTENTS
Abstract ...................................................................................................................... ii Abstract In Arabic ...................................................................................................... iii
Approval Page ............................................................................................................ v Declaration ................................................................................................................. vi Copyright ................................................................................................................... vii Acknowledgements .................................................................................................... viii List Of Tables ............................................................................................................ xi
List Of Figures ........................................................................................................... xii List of Abbreviations ................................................................................................. xiii List of Symbols .......................................................................................................... xiv
CHAPTER ONE: INTRODUCTION .................................................................... 1 1.1 Background of the Study ......................................................................... 1 1.2 Statement of the Problem ........................................................................ 4
1.3 Research Objectives ................................................................................ 6 1.4 Research Questions ................................................................................. 7
1.5 Significance of the Study ........................................................................ 7 1.6 Scope of the Study .................................................................................. 7
1.7 Research Methodology............................................................................ 8
1.8 Organisation of the Thesis ...................................................................... 9
CHAPTER TWO: LITERATURE REVIEW ....................................................... 11
2.1 Introduction ............................................................................................. 11 2.2 Formal Languages and Chomsky grammars ........................................... 11 2.3 Grammars with regulated Rewriting ....................................................... 14
2.3.1 Multiset Grammars ........................................................................ 16 2.3.2 Tree Controlled Grammars ............................................................ 21
2.3.3 Valence Grammars ........................................................................ 24 2.4 Summary ................................................................................................. 27
CHAPTER THREE: PRELIMINARIES .............................................................. 29
3.1 Introduction ............................................................................................. 29 3.2 General notations .................................................................................... 29
3.3 Alphabet, String, Language and multiset ................................................ 29 3.4 Operations on Languages ........................................................................ 30 3.5 Grammars and Chomsky Hierarchy ........................................................ 31 3.6 Derivation Tree ....................................................................................... 33 3.7 Grammars with Regulated Rewriting...................................................... 34
CHAPTER FOUR: MULTISET CONTROLLED GRAMMARS ..................... 36 4.1 Introduction ............................................................................................. 36 4.2 Definitions and Examples ....................................................................... 36 4.3 Generative Powers .................................................................................. 41
4.4 Normal Form ........................................................................................... 52 4.5 Closure Properties ................................................................................... 56
x
4.6 Application: parsing ................................................................................ 59 4.6.1 Parsing Based on an Extended CYK Algorithm ............................ 60
4.7 Summary ................................................................................................. 71
CHAPTER FIVE: OTHER BALANCED GRAMMARS .................................... 73 5.1 Introduction ............................................................................................. 73 5.2 Tree Multiset Controlled Grammars ....................................................... 73
5.2.1 Definitions and Example ................................................................ 73 5.2.2 Generative Power ........................................................................... 77 5.2.3 Closure Properties .......................................................................... 90
5.3 Tree Valence COntrolled Grammars ...................................................... 92 5.3.1 Definitions and Example ................................................................ 92
5.3.2 Generative Power ........................................................................... 98 5.3.3 Closure Properties .......................................................................... 112
5.4 Tree Regularly Controlled Grammars ..................................................... 114 5.4.1 Definitions and Example ................................................................ 114 5.4.2 Generative Power ........................................................................... 116 5.4.3 Closure Properties .......................................................................... 127
5.5 Balanced Two-Steps Controlled Grammars............................................ 128 5.5.1 Definitions and Example ................................................................ 128
5.5.2 Generative Power ........................................................................... 131 5.5.3 Closure Properties .......................................................................... 139
5.6 Summary ................................................................................................. 140
CHAPTER SIX: CONCLUSION AND FUTURE WORK .................................. 142
REFERENCES ......................................................................................................... 145
LIST OF PUBLICATIONS .................................................................................... 151
xi
LIST OF TABLES
Table 3.1 Closure Properties Owned by Chomsky Grammars 33
xii
LIST OF FIGURES
Figure 2.1 Set Inclusion of Grammars Described by Chomsky Hierarchy 13
Figure 4.1 CYK Table 64
Figure 4.2 Final CYK Result Table 70
Figure 4.3 Final Counter Multiset Table 70
Figure 4.4 The Derivation Tree for Example 4.6.1 71
Figure 4.5 The Hierarchy of Families of Language Generated by Multiset
Controlled Grammar 72
Figure 5.1 Example of a Tree Derivation for a string a3b3c3 (TM) 76
Figure 5.2 Example of a Tree Derivation for a string a3b3c3 (TV) 97
Figure 5.3 Example of a Tree Derivation for a string a8 116
xiii
LIST OF ABBREVIATIONS
CYK Cocke-Younger-Kasami
RE Recursive enumerable languages
CS Context-sensitive languages
CF Context-free languages
REG Regular languages
LIN Linear languages
MAT The family of languages of matrix grammars (without erasing rules)
MATλ The family of languages of matrix grammars (with erasing rules)
TC The family of languages of tree controlled grammars (without erasing
rules)
TCλ The family of languages of tree controlled grammars (with erasing
rules)
aVAL The family of languages of additive valence grammars (without erasing
rules)
aVALλ The family of languages of additive valence grammars (with erasing
rules)
mREG Multiset controlled regular languages (without erasing rules)
mREGλ Multiset controlled regular languages (with erasing rules)
mLIN Multiset controlled linear languages (without erasing rules)
mLINλ Multiset controlled linear languages (with erasing rules)
mCF Multiset controlled context-free languages (without erasing rules)
mCFλ Multiset controlled context-free languages (with erasing rules)
TM The family of languages of tree multiset controlled grammars (without
erasing rules)
TMλ The family of languages of tree multiset controlled grammars (with
erasing rules)
TV The family of languages of tree valence controlled grammars (without
erasing rules)
TVλ The family of languages of tree valence controlled grammars (with
erasing rules)
TR The family of languages of tree regularly controlled grammars (without
erasing rules)
TRλ The family of languages of tree regularly controlled grammars (with
erasing rules)
TS The family of languages of balanced two-steps controlled grammars
(without erasing rules)
TSλ The family of languages of balanced two-steps controlled grammars
(with erasing rules)
xiv
LIST OF SYMBOLS
∈ Membership of an element to a set
∉ Negation of ∈
⊆ Inclusion
⊂ Proper inclusion
∪ Union
∩ Intersection
× Cross product
− Difference
|A| Cardinality of set A
2A Power set of A
∅ Empty set
{A} Element in a set
Z Integer number
N Natural number
R Real number
Q Rational number
Σ Alphabet
Σ∗ The set of all finite strings over V
Σ+ The set of all non-empty finite strings over V
λ Empty string
|w| Length of string w
wR Mirror image of string w
μ Multiset
μ(a) Number of occurrences of a in μ
A⊕ Set of all multisets over set A
G Grammar
S Start symbol
P Set of production rules
N Set of nonterminal symbols
T Set of terminal symbols
1
CHAPTER ONE
INTRODUCTION
1.1 BACKGROUND OF THE STUDY
Soon after the emergence of modern computers, people have realized the fact that all
forms and types of data (information) such as names, numbers, pictures, sounds, videos,
waves and so on can be regarded as structures of symbols, (i.e., strings or words) in
which the collection of those strings is called language as well as is the stem of formal
language theory, a branch of knowledge that studies formal grammars and languages
(Jiang et al., 2010). Formally, formal language theory is defined as the study of sets of
abstract words over a finite alphabet of symbols where arguably can categorize the
language representative models into two fundamental groups, which are generative and
recognition language models. Generative models, which also well known as grammars,
are used for words generation while recognition models, which preferable called
automata, are used for words acceptance. In this thesis, we focus only on the formalism
of grammars where particularly on the grammars with regulated rewriting.
A grammar can be simply defined as a set of formation productions (aka rules)
for words in a formal language where such productions depict the valid ways for words
to be formed from the language’s alphabet correspond to the language’s syntax
(Chomsky, 1956). In other words, a grammar consists of a set of finite rules over a set
of finite variables and characters in which generating a language is depending on the
rules be applied. Thus, it can be seen as a formal statement portraying the language
structure or motif to be described. In general, grammars are used and to generate and
analyze the strings of a language. The significances of using grammars comprise the
fact that they give a formal definition of the syntax of a language in which it enables
2
reasoning with regard to language elements, they can construct the kernel for a parsing
algorithm as well as they can be used as a tool for syntax specification (Jiang et al.,
2010). In term of application, grammars are also found to be profoundly useful in
computing alignments and solving the approximate string matching problems such as
plagiarism, mirror pages and biomarker (Siederdissen, Hofacker & Stadler, 2015).
Besides, it also can be employed in comparing sequences of nucleic acids (DNA or
RNA) or of amino acids (proteins) chains for diagnosing certain diseases or verification
issues (Chiang, 2012).
Basically, grammars can be classified into two fundamental categories which are
context-free grammars and non-context-free grammars. Among those two, context-free
grammars are the most developed and well examined grammar class in Chomsky
hierarchy due to their beauty in term of simplicity and intuitively captivating formalism.
In fact, context-free grammars have been widely used in many applications where
generally are in the compilation and specification of languages of programming (Bel-
Enguix, Jimenez-Lopez & Martin-Vide, 2008; Martin-Vide, Mitrana & Paun, 2004). To
be more specific, they have been used for an authentication protocol which use one-
time authentication information to generate one time passwords (Singh, Dagon & Dos
Santos, 2004), for generating equation structures that can predict the peak ground
acceleration of an event of earthquake by describing the dependencies of a given set of
data (Bosman & Gruner, 2013) and for enhancing the log file analysis by delineating
the intrusion patterns and acceptable log files (Markic & Stankovski, 2013).
Generally, a context-free grammar is a set of rules that grant one to substitute a
variable by a string of terminals and variables where each string in the language own a
derivation tree with leftmost derivation. They have a broad applicability and at the same
time they have a lot of good sides in terms of computational properties and complexity
3
problems (Bel-Enguix, Jimenez-Lopez & Martin-Vide, 2008; Sipser, 2013). However,
it is well known that the world is not totally “context-free” where there are many
circumstances that caused the appearing of non-context-free languages that have the
basic features like reduplication (e.g: {𝑤𝑤|𝑤 ∈ 𝑇∗}) , multiple agreement (e.g:
{𝑎𝑛𝑏𝑛𝑐𝑛𝑑𝑛|𝑛 ≥ 1}) and crossed agreements (e.g: {𝑎𝑛𝑏𝑚𝑐𝑛𝑑𝑚|𝑛,𝑚 ≥ 1}) (Dassow
& Paun, 1989; Dassow, Paun & Salomaa, 1997).
Thus, we need to go beyond context-free grammars where one of the solutions
is to consider the context-sensitive grammars which are more powerful (Dassow &
Paun, 1989). Nevertheless, in spite of their great power, they have some serious
problems in the practical usage, where they have several adverse features regarding
decidability problems in which whether they are undecidable or having exponential time
algorithms (Dassow & Paun, 1989). In context-sensitive grammars, the emptiness and
finiteness problems are undecidable. Furthermore, it is hard or impossible to describe
the derivations of context-sensitive grammars by a graph or tree structure which is an
essential tool in analyzing the structure of the problems (Dassow & Paun, 1989). These
are the reasons why many researchers are looking for intermediate grammars between
context-free and context-sensitive grammars, called grammars with regulated rewriting
(aka regulated or controlled grammars), where they can combine the beauty and
simplicity of context-free, at the same time possess the power of context-sensitive
grammars.
A regulated grammar is portrayed as a grammar with some additional
mechanisms where the applications of certain rules are being restricted in order to avoid
certain derivations process where it signifies that the strings set generated by a grammar
with regulated rewriting is a subset of the strings set generated by the same grammar
without regulated rewriting (Dassow & Paun, 1989; Meduna & Soukup, 2017). The
4
core notion behind regulated rewriting mechanism is to choose a simple model with
high computational power. There is a broad variety of interesting regulated grammars
which each of them use different modes of operation to regulate restrictions. Here are
some examples, in matrix grammars, one need to use only particular previously
specified of rules (Abraham, 1965), in regularly control grammars, the string of rules
correlating to a derivation must be the property of a set of strings formerly specified
(Ginsburg & Spanier, 1968), in ordered grammars, certain rules cannot be used if some
others are still related to each other (Fris, 1968), in programmed grammars, the choice
of one rule will determine the next applicable of rules (Rsenkrantz, 1969) and many
more.
1.2 STATEMENT OF THE PROBLEM
Regulated grammar is one of various types of grammars introduced throughout the
history of formal language theory with a main goal to extend the power of grammars
without significantly increasing any complexity of their model. It is one of the most
competent practices to augment the hierarchy of Chomsky with a huge variation of
classes of languages that place within it (Meduna and Zemek, 2014). The common
practice for implementing a regulated grammar is to combine two simpler models which
are a grammar, use for generating a sentence by its productions and an additional control
model, use for restricting the derivations process.
In monographs written by Meduna and Soukup (2017), Meduna and Zemek
(2014) and Dassow and Paun (1989), we can find a large number of various types of
regulated grammar that preserve the nature of context-free such as tree controlled
grammars, matrix grammars, valence grammars, programmed grammars, probabilistic
grammars, state grammars, random-context grammars, ordered grammars and many
5
more. All of these grammars have achieved plentiful remarkable results within formal
language theory and are different from each other, depending on their restrictions either
based upon the variety of context related or on the use of rules during the process of
generating the languages. However, under certain circumstances, they are too
complicated or not computationally complete or correlate to a group of grammars with
too many unsolvable decision problems which have lessen the practical interest
(Dassow & Paun, 1989). Moreover, this issue has been investigated for a long period of
time and there is still no definite method which proven to be the best in solving it. In
addition, the swift growth in present day technology, industry and other fields have
resulted more and more new and intricate issues arise which require to seek for new
suitable tools to counter those issues.
Although some studies on grammars have been done; multiset grammars apply
the rules as multiset in restricting the use of grammars` productions (Kudlek, Martin &
Paun, 2001); tree controlled grammars impose restrictions upon the derivation trees of
grammars using regular language (Culik & Maurer, 1977) and valence grammars which
assigning each production with an integer from a given monoid (Paun, 1980), there are
still some captivating topics in this direction to look for future study. For instance, there
is no research done in using multiset on terminal symbols which can be based on an
operation namely “counter” where in every production in the grammar, a multiset value
will be given to it depending on the number of terminal alphabet existed in the right-
hand side of that production as a control mechanism. Other than that, how powerful the
grammars can be if we combine the valence or multiset with tree basic structure to
control the derivation of grammar like in tree controlled grammars as well as if we
implement regular sets of productions of the grammars rather than checking it are also
yet to be known.
6
Balanced grammars have been structurally introduced by Berstel and Boasson
(2002) with the intent to overcome the ambiguous issue. They were about Dyck
languages where they generate words over a parenthesis set that are well-formed to be
described (bracketed structure). The right-hand side of their productions for each
nonterminal is a regular set. In short, they are a generalization of grammars of
parentheses in two directions. Then, a balanced language is characterized through a
syntactic congruence property (Berstel & Boasson, 2002; Brabrand, Giegerich &
Moller, 2010). However, there is no study done from an arithmetical aspect for that
topic.
1.3 RESEARCH OBJECTIVES
This research aims to accomplish the following objectives:
1. To define different variants of balanced grammars such as multiset
controlled grammars, balanced two-steps controlled grammars, tree
multiset controlled grammars, tree valence controlled grammars and tree
regularly controlled grammars.
2. To study the computational power of balanced grammars.
3. To investigate the closure properties of balanced grammars.
A formal definition is precise where it states the essential and adequate
conditions for a thing to be a member of a particular set, resolving any uncertainties
matters as well as differencing certain term from any other term. Indeed, the power of
regulated models provides information regarding the family of languages defined by the
models while knowing their properties determine whether several types of simple
languages can be merged in a complex language and vice versa.
7
1.4 RESEARCH QUESTIONS
In order to elaborate the objectives of the research, we set the following research
questions:
1. What are the balance features of the strings generated by context-free
grammars?
2. What types of balanced grammars can be defined?
3. What are the computational capacities of balanced grammars?
4. What kinds of closure properties do balanced grammars possess?
1.5 SIGNIFICANCE OF THE STUDY
Regulated grammars which efficiently implement parsing and other generative
operations are theoretical frameworks for data science based computing tools and
algorithms. They can be used as a base for information processing technologies. In
addition, the result of this research is appeared in the form of new theorems in which
will extent the corpus of knowledge concerning theory of formal languages.
1.6 SCOPE OF THE STUDY
The material regarding regulated grammars is so enormous in which it is indisputably
nonviable to cover it completely in one study. In this thesis, we focus on introducing a
new model of regulated grammar called balanced grammars using five new modes of
operation. We give a formal definition with example and restrict our attention to their
generative powers and properties. The focus does not lie in demonstrating the real world
application of introduced grammars as it is too wide for the scope of this thesis.
8
1.7 RESEARCH METHODOLOGY
This research applied a constructive theoretical approach where it intended to provide
new theories based on mathematical and formal methods. Therefore, the results are
mainly appeared in the form of mathematical statements such propositions, lemmas and
theorems. The research is conducted in five phases as follows:
Phase 1. Literature review
We conducted a systematic literature review together with the comparative analysis of
the previous related studies on formal language theory, Chomsky grammars and
grammars with regulated rewriting concentrating on multiset grammars, tree controlled
grammars and valence grammars which will help in introducing and studying several
new variants of balanced grammars.
Phase 2. Preliminaries
We provided the necessary basic notations, terminologies and definitions related to
formal languages theory, multiset and a derivation tree which will be used to produce
results established throughout the study.
Phase 3. Introduction of balanced controlled grammars
We defined five different concepts of balanced grammars by adapting multiset, valence,
weight and tree structure as the control mechanisms in the grammar counterparts. In
addition, we constructed a few examples in comparative approach to explore their
nature.
9
Phase 4. Study of computational power
We studied the computational power of balanced grammars with comparison to general
Chomsky grammars as well as with other well-known controlled grammars such tree
controlled grammars, matrix grammars and valence grammars.
Phase 5. Investigation of closure properties
We examined the closure properties such as union, kleene star, complement,
concatenation, substitution, mirror image, homomorphisms, permutation and so forth
that can be possessed by balanced grammars using the available techniques and methods
applied in proving closure properties of Chomsky languages and grammars.
1.8 ORGANISATION OF THE THESIS
This section gives a rough idea of the rest of the thesis structure where its purpose is to
portray the thesis flow as well as the main content of thesis chapters and how they are
connected to each other. Chapter 2 provides a relevant overview of previous related
works on formal languages theory and grammars with regulated rewriting especially
concerning multiset grammars, tree controlled grammars and valence grammars. Then,
Chapter 3 recalls some well-known basic notations, terminologies, facts, concepts and
results related to the formal languages theory, derivation tree, grammar with and
without regulated rewriting as well as operations on languages which will be used in
hereinafter investigations.
Afterwards, Chapter 4 introduces a new variant of balanced grammars known as
multiset controlled grammars and studies its computational powers, normal form,
closure properties and applications. Next, Chapter 5 introduces four new variants of
balanced grammars called tree multiset controlled grammars, tree valence controlled
10
grammars and tree regularly controlled grammars balanced two-steps controlled
grammars and investigates their computational powers as well as their closure
properties. Lastly, Chapter 6 gives a summarization of all materials discussed in
previous chapters as well as with possible future research discipline raised in this thesis.