mathematics as a game of types
DESCRIPTION
Mathematics as a Game of TypesTRANSCRIPT
Mathematics as a
Game of Types
(Thesis Format: Monograph)
bv
Jackson W. Marques de Carvalho
Graduate Program in
Computer Science
A thesis subm itted in partial fulfillment of the requirements for the degree of
D octor of Philosophy
Faculty of G raduate Studies The University of W estern O ntario
London, O ntario, C anada
© Jackson W. M arques de Carvalho 2005
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
1*1 Library and Archives Canada
Published Heritage Branch
395 Wellington Street Ottawa ON K1A 0N4 Canada
Bibliotheque et Archives Canada
Direction du Patrimoine de I'edition
395, rue Wellington Ottawa ON K1A 0N4 Canada
Your file Votre reference ISBN: 0-494-12080-0 Our file Notre reference ISBN: 0-494-12080-0
NOTICE:The author has granted a nonexclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distribute and sell theses worldwide, for commercial or noncommercial purposes, in microform, paper, electronic and/or any other formats.
AVIS:L'auteur a accorde une licence non exclusive permettant a la Bibliotheque et Archives Canada de reproduire, publier, archiver, sauvegarder, conserver, transmettre au public par telecommunication ou par I'lnternet, preter, distribuer et vendre des theses partout dans le monde, a des fins commerciales ou autres, sur support microforme, papier, electronique et/ou autres formats.
The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.
L'auteur conserve la propriete du droit d'auteur et des droits moraux qui protege cette these.Ni la these ni des extraits substantiels de celle-ci ne doivent etre imprimes ou autrement reproduits sans son autorisation.
In compliance with the Canadian Privacy Act some supporting forms may have been removed from this thesis.
While these forms may be included in the document page count, their removal does not represent any loss of content from the thesis.
Conformement a la loi canadienne sur la protection de la vie privee, quelques formulaires secondaires ont ete enleves de cette these.
Bien que ces formulaires aient inclus dans la pagination, il n'y aura aucun contenu manquant.
i * i
CanadaR eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
THE UNIVERSITY OF WESTERN ONTARIO FACULTY OF GRADUATE STUDIES
CERTIFICATE OF EXAMINATION
Supervisor
Dr. Helmut Jurgensen
Supervisory Committee
Examiners
Dr. Stephen Watt
Dr. Kamran Sedig
Dr. David Spencer
Dr. Gerhard Weber
The thesis by
Jackson Carvalho
entitled:
Mathematics as a Game of Types
is accepted in partial fulfillment o f the requirements for the degree o f
Doctor o f Philosophy
Date April 8 , 2005___________________ Richard Kane______Chair o f the Thesis Examination Board
ii
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
Abstract
This thesis presents a gram m ar-based approach to the specification of m athem atical
notation . The method introduced is based on a m eta-structure th a t uses a ttr ib u ted
context-free grammars for capturing the meaning of m athem atical concepts. This
s tru c tu re supports the creation of m ulti-purpose docum ents and allows the specification of m athem atical notation in a dynam ical way. In the context of th is thesis,
m ulti-purpose documents refer to docum ents th a t may be rendered or used in differ
ent ways, some of which might not be known a t the tim e the docum ent is created.
By dynam ical it is understood th a t th e meaning associated w ith syntax is allowed to
be modified.
The proposal described in this thesis is based on an authoring model which addresses the user needs as a fundam ental requirement. This characteristic is structured around
a scope mechanism th a t allows the m apping between semantics and syntax to be modified a t any time during authoring. This process supports the dynam ic charac
teristics of the m eaning-to-syntax binding necessary during the authoring of m a th
em atical concepts. The m ulti-purpose property is supported by a sem antics-based
m ark-up th a t provides the possibility for the m athem atical concepts to be processed
according to the specific requirements of applications. M odular gram m ar fragm ents
characterized by a one-to-one m apping between m athem atical concept and gram m ar
representation provide the adequate support for the definition of the various scopes.
An increm ental update process is defined as a way to modify the necessary gram m ar fragm ents to support the changes proposed during the authoring process.
/keywords: m athem atics, types, user-oriented, interfaces, m etasystem , gram m ars, rendering, notation, authoring, m ultim odal
iii
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
Acknowledgments
I would like to thank my supervisor, Dr. Helmut Jiirgensen, who believed in me,
for proposing the problem, for his guidance and mentorship. I would also like to
thank M aia Hoeberechts for reading the previous version of this thesis and for her suggestions.
I am grateful to my parents, Jose and Janete, for making me understand the im por
tance of education and work. I wish to thank my children Carolina, Marcello e Luiza
for always rem inding me life can be fun even during difficult times. My special thanks
to my wife Rozane for her support, love and dedication to our children.
This work has been partially supported by the Conselho Nacional de Desenvolvimento
Cientffico e Tecnologico (CNPq), by the Universidade Federal do Rio G rande do Norte (UFRN), by Dr. Helm ut Jiirgensen.
iv
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
Table of Contents
Certificate of Exam ination ii
Abstract iii
Acknowledgements iv
1 Introduction 1
1.1 The Problem: Capturing Semantics by Means of User-Defined Syntax 3
1.2 Related W o r k .......................................................................................................... 4
1.2.1 D ata Model and D ata R ep resen ta tio n ............................................... 5
1.2.2 SGML and X M L ..................................................................................... 5
1.2.3 XML and RELAX N G ........................................................................... 7
1.2.4 A S T E R ....................................................................................................... 8
1.2.5 O p e n M a th ................................................................................................... 9
1.2.6 M a th M L ....................................................................................................... 13
1.2.7 Some Lim itations of Both O penM ath and M a th M L ................... 14
1.2.8 C o m p o s itio n a lity ..................................................................................... 14
1.3 M o tiv a tio n ................................................................................................................. 15
1.4 A Solution: Dynamical Document S t r u c t u r e ................................................ 17
1.5 Approach T a k e n ...................................................................................................... 18
1.6 Thesis O verview ...................................................................................................... 19
2 Basic N otions and N otation 21
2.1 Basic D e f in i t io n s ................................................................................................... 21
v
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
3 A Framework for Interactive System s 24
3.1 Basic N o t io n s ........................................................................................................ 24
3.1.1 Electronic and Paper D o c u m e n ts ...................................................... 24
3.1.2 Comm unication, Media and M o d a li t ie s ........................................... 26
3.2 User Interface Basic C o m p o n e n ts ................................................................. 27
3.3 An Existing M o d e l ............................................................................................. 28
3.3.1 A S tructuring P ro b le m ........................................................................... 29
3.4 A Different S tructure for Interactive S y s te m s ............................................. 30
3.4.1 A New F ra m e w o rk .................................................................................. 30
3.5 E x a m p le .................................................................................................................. 32
3.6 Sum m ary ............................................................................................................... 33
4 Authoring Environments 34
4.1 In tro d u c tio n ............................................................................................................ 34
4.2 Interaction Objects and A uthoring E n v iro n m e n ts .................................... 34
4.3 Cognitive D is ta n c e s ............................................................................................. 36
4.4 Rendering In fo rm a t io n ....................................................................................... 37
4.5 Encoding M athem atical C o n c e p ts .................................................................. 38
4.6 Environm ent M odifications................................................................................ 40
4.7 Changes in the I n te r f a c e ................................................................................... 41
4.8 R ecom m endations................................................................................................. 42
4.9 Sum m ary ............................................................................................................... 42
5 M athem atical Constructs and their Representation 44
5.1 N otational Systems as L a n g u a g e s .................................................................. 45
5.2 S tandard M athem atical N otation C harac te ris tics ...................................... 46
5.3 C apturing the Semantics of M athem atical C o n c e p ts ................................. 48
5.3.1 M athem atics and D ocum ent A u th o r in g ............................................ 49
vi
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
5.3.2 CFGs and D ata T y p e s .......................................................................... 50
5.3.3 CFG Lim itation to Support A uthoring M ath em atics ................... 51
5.3.4 U pdating C F G s ........................................................................................ 52
5.3.4 . 1 Identical Syntax and Rule S e m a n tic s .............................. 54
5.3.4.2 Redundancy, Syntax Equivalence and Normal Forms 56
5.4 Representing Polynomials ............................................................................... 59
5.5 Representing Subscripts and S u p e rsc rip ts ................................................... 61
5.5.1 Overloading S u b s c r ip ts ........................................................................... 63
5.5.2 Overloading Superscripted S y m b o ls .................................................. 64
5.6 Representing M a t r i c e s ...................................................................................... 64
5.7 Representing Sets of N u m b e r s ........................................................................ 6 6
5.8 Representing S u m s ............................................................................................. 67
5.9 C onclusion ............................................................................................................... 70
6 M odelling Context Dependent Information 71
6.1 A uthoring M athem atics and M ultim o d ality ................................................ 71
6.2 A Formal S tructure for Document A u th o rin g ............................................. 75
6.2.1 G ram m ars and Dynamic Document A u th o rin g ............................. 77
6.3 S tructuring with G r a m m a r s ............................................................................ 79
6.3.1 M athem atical Concepts and G ram m atical Dependencies . . . 82
6.4 G ram m ar O perations and E x te n s ib il ity ....................................................... 87
6.5 S tructuring with Domains and D ir e c to r ie s ................................................ 90
6.5.1 Domains, Directories and Symbol O v e rlo ad in g ............................. 92
6 . 6 Languages as Control S tru c tu re s ..................................................................... 94
6.6.1 D irectory Com position E x a m p le ......................................................... 97
6.6.2 The Control M e c h a n is m ....................................................................... 99
6.7 The Role of C o m p i le r s ....................................................................................... 100
6 . 8 M e ta -S tru c tu re ..................................................................................................... 102
6.9 C onclusion ............................................................................................................... 104
vii
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
7 Examples 106
7.1 Exam ple 1: Overloading the -I- and * s y m b o ls ........................................... 107
7.2 Exam ple 2: Symbols as operators and o p e r a n d s ........................................ 1 1 2
7.3 Exam ple 3: More meanings for the + sy m b o l.............................................. 116
8 The Processing Structure 120
8.1 Dynamic A uthoring and Language F r a g m e n ts ........................................... 120
8.2 Processing G ram m ar F ra g m e n ts ...................................................................... 122
8.3 Dynamic A uthoring and D ocum ent P ro c e s so rs ........................................... 122
8.3.1 E x a m p le ..................................................................................................... 124
9 Concluding Remarks 125
9.1 D isc u ss io n ................................................................................................................ 125
9.2 A uthoring with G ram m ar F r a g m e n ts ............................................................ 127
9.3 Future W o rk ............................................................................................................. 129
V ita 141
viii
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
List of Tables
4.1 P en /p ap er authoring environm ent.................................................................... 35
4.2 l^X -based authoring environm ent.................................................................... 36
4.3 Document authoring environm ent characteristics and software designapproaches to help achieving them ................................................................... 43
5.1 CFG rules for addition of integers 0 and 1..................................................... 52
5.2 G ram m ar for addition of integers 1 and 2........ .............................................. 54
5.3 G ram m ar for concatenation of characters a and b ..................................... 54
5.4 Derivation of word 1 + 2....................................................................................... 55
5.5 Derivation of word a + b........................................................................................ 55
5.6 G ram m ar for operations on integers and characters.................................... 55
5.7 Derivation of word a + 2....................................................................................... 56
5.8 CFG fragment for expressing words from G ........................................... 60
5.9 CFG fragment for expressing add ition , ellipsis and addition operations. 60
5.10 CFG fragment for expressing equality operation................................... 61
5.11 CFG fragment for subscripts and superscripts....................................... 62
5.12 CFG representation of the positive and negative parts of a function. . 64
5.13 CFG fragment for m atrices.................................................................................. 65
5.14 CFG fragm ent for intervals.................................................................................. 6 6
5.15 CFG fragment to capture the semantics of intervals.................................. 67
5.16 G ram m ar for sum m ation...................................................................................... 6 8
ix
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
5.17 G ram m ar for sum m ation.................................................................................... G9
6.1 Components involved in dynam ic authoring for m ultim odality............. 74
6 . 2 CFG for equality of strings of characters...................................................... 83
6.3 CFG for representation of schemes.................................................................. 84
6.4 G ram m ar fragments illustrating gram m ar dependencies.......................... 85
6.5 Basic gram m ar for add ition ............................................................................... 8 8
6 . 6 Operatorless gram m ar linking expr and term nonterm inals....... 89
6.7 Operatorless gram m ar linking term and num nonterm inals....... 89
6 . 8 Prim itive gram m ar setting nonterm inal num to term inal NUM BER . 90
6.9 Derived gram m ar for add ition ........................................................................... 90
6.10 Resulting gram m ar for expressions involving addition .............................. 91
6.11 Basic gram m ar for m ultiplication.................................................................... 91
6.12 O peratorless gram m ar linking term and factor nonterm inals..... 92
6.13 O peratorless gram m ar linking factor and num nonterm inals..... 92
6.14 Derived gram m ar for m ultiplication................................................................ 93
6.15 Resulting gram m ar for expressions involving addition and m ultiplication. 93
6.16 G ram m ar to support the use of both the composition and extension
operators.................................................................................................................... 94
6.17 CFG for the binding control m echanism ........................................................ 99
6.18 P roduction rules for the m eta-gram m ar........................................................ 103
6.19 A ttribu ted gram m ar to support the capturing of simple sum m ations. 104
7.1 Default gram m ars.................................................................................................. 107
7.2 G ram m ar fragments created by editing .......................................................... 107
7.3 G ram m ars in domain directory G® th a t have been created by gram m ar
operations.................................................................................................................. 109
7.4 G ram m ars in domain directory th a t have been created by gram m aroperations.................................................................................................................. 1 1 0
x
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
7.5 G ram m ars in domain directory G \ created by editing..........................1 1 2
7.6 G ram m ars in domain directory G \ th a t have been created by gram m ar
operations.................................................................................................................. 113
7.7 G ram m ar in domain directory G 3 created by editing....................... 113
7.8 G ram m ar in domain directory G 3 created by gram m ar operations. . . 114
7.9 G ram m ars in domain directory G° created by editing.............................114
7.10 G ram m ars in domain directory G® created by gram m ar operations. . 115
7.11 G ram m ars in domain directory Gj created by editing.............................117
7.12 G ram m ars in domain directory G° created by editing.............................117
7.13 G ram m ars in domain directory G? created by gram m ar operations. . 118
7.14 G ram m ars in domain directory G® created by gram m ar operations. . 118
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
List of Figures
3.1 Gregory Abowd's framework for interactive system s....................... 28
3.2 The Proposed Framework for Interactive Systems............................ 32
4.1 Framework for docum ent authoring environm ents............................ 40
5.1 M anv-to-many relationship between m athem atical concepts and their
representation................................................................................................. 47
6.1 S tructure to support dynam ic authoring and m ultim odality processing 73
6 . 2 A sketch of the dynamics of the au thoring/rendering process...... 74
xii
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
1
Chapter 1
Introduction
R ather than require th a t users change, system designers could adap t their
systems to key aspects of the users’ work practice [33] . . .
Reading and w riting m athem atics are activities th a t involve distinct characteristics of the notation used. Reading requires a stable m eaning-to-syntax m apping where
concepts may always be identified by an expected syntax. On the other hand, writing
m athem atics dem ands the possibility of the introduction of m eaning-to-syntax m ap
pings th a t, according to the au thor of the docum ent, best identify the inform ation
to be com municated. The fact th a t readers benefit from a standard notation and
writers require the flexibility to define new m eaning-to-syntax mappings is viewed, in this thesis, as characteristics th a t are in tension.
Approaching the specification of the m athem atical notation for electronic docum ents
by providing a standard will, of course, benefit readers. This also implies th a t users of com puterized systems th a t support the standard will be forced to adap t to the
details provided by the specific notation in order to m anipulate the concepts there
represented. One may argue th a t learning any notation provided by a system may
not be a m ajor concern since adequate hum an-com puter interfaces may be provided
to support this activity. This is true for the case when the underlying m athem atical
notation is stable and fixed. It means the relation between syntax and semantics does not change and new concepts are not allowed to be added to the set covered by
the notation. It is undeniable th a t notations th a t are both stable and fixed could be
enforced for users of com puter algebra systems, for instance. It is also intuitive to see th a t the addition of adequate G raphical User Interfaces (GUIs) would help minimize
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
2
the effort required to use any system th a t initially supports only text-based interfaces
for the m anipulation of m athem atical notation. An example of this is M athType [70]
th a t uses a GUI as a form of helping the user to produce the correct T£X syntax.
As new concepts are introduced, encodings are needed to support their m anipula
tion. Consequently the m athem atical notation evolves by extending the relationship
between concept and syntactical representation. From an au th o r’s point of view
the relationship between m athem atical concepts and their representation may be ex
pressed in two possible ways: authors may choose to use an already existing syntax,
or they may provide a new syntactical encoding for the concept.
Regardless of using new or already available notation and using a GUI or any other
type of user interface, com puterized systems to support m athem atical notation need
to be based on an authoring model. The set of constraints and facilities the au thor
will experience during the complete process of generating m athem atical notation for
electronic docum ents are the fundam ental characteristics of these models.
A lthough it is reasonable to enforce a specific m athem atical notation for readers
it does not make sense to restrict the authoring process to any standard notation
vocabulary and w riting style. This does not indicate th a t a standard notation is
not necessary. It ju s t supports the intuitive notion th a t authors should have the
freedom to modify the set of m appings between symbols and m eaning provided by a
standard . The modifications required during the authoring activity may either result
from the au th o r’s need to com m unicate concepts not supported by the standard or by
a necessity to redefine some elements of the set of mappings. A nother characteristic
of this process is th a t authors do not usually supply their notational conventions at
one specific part of the docum ent. They, instead, introduce notation wherever they
feel it is necessary.
In essence a standard notation for the representation of m athem atical concepts is
therefore necessary for the com munication of inform ation among com puter systems.
Examples of such notations are the ones proposed by O penM ath [23] and M athM L [14],
However, such standards are not desirable for supporting the flexibility required by
the authoring process during the creation of docum ents containing m athem atics. This
is because user requirements regarding the notation are determ ined during the au
thoring activity. For this scenario a dynam ic notation is needed.
In order to be capable of handling unforeseen m eaning-to-syntax relationships, a no tational system m ust be organized around the possibility of describing the construction
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
3
of the rules instead of providing the rules themselves. This allows authors to create
the notation th a t, according to them , best fits the purpose of the docum ent. This pro
cess characterizes a meta-system , and instances of it will consequently be notational systems.
Central to the design of any com puter-based application are the user's characteristics and the contexts in which the application will be used. The need for m ultiple modes
of com munication and m ultim edia has been acknowledged bv [73, 80, 15], and many
others, as a promising approach to improve the com puter access by visually im paired
users. In particu lar the development of m ultim edia documents supported by user
interfaces which can be configured to adap t to users with print disabilities have been
addressed by [80]. The im portance of m ultim odalities and m ultim edia to support the
com puter-based communication of m athem atics has been emphasized by [42],
This research was originally m otivated to make docum ents accessible to blind people.
Fundam ental requirements associated with these users’ lim itations had therefore to
be considered. These concerns included the followed two possibilities1:
1 . to allow input and ou tpu t to be performed through the various senses of the
hum an perceptual system and,
2 . to optim ize the use of each m odality in order to adapt to the users' cognitive
abilities2.
The above mentioned characteristics required the docum ent representation to be in
dependent of the m odality /m edia used for communication.
1.1 The Problem: Capturing Semantics by Means of User-Defined Syntax
I am concerned with the design of com puter-based interactive systems for processing
both the capturing and rendering of m athem atical concepts. In this thesis I focus on
'These requirements as well as other characteristics related to the design of multimodal user interfaces are presented in [91].
2The communication of digital logic diagrams to visually impaired users, for instance, may he improved when a tactile display is used in combination to speech [59].
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
4
the capturing part of the problem. In order to approach this I consider the following issues:
1. The notation used for the encoding of the m athem atical concepts is not fixed.
It may be modified a t the docum ent au thor's discretion. This means the au thor is free to attach any syntax to any given concept.
2. The m eaning of m athem atical concepts can be captured bv means of a text-
based docum ent structure.
3. The structu re of any docum ent involving only m athem atics is the only provider of m eaning to the concepts there included.
4. The user interface used for com puter-assisted docum ent authoring is indepen
dent of the structure of the docum ent. It is viewed as a component th a t com
municates with the docum ent structure.
1.2 Related Work
A discussion of some related efforts which have trea ted the problem of the represen
ta tion of the semantics of m athem atical concepts is presented in this section. Due to
the im portance of processing electronic docum ents th a t contain m athem atics a new
interdisciplinary field, M athem atical Knowledge M anagement (MKM), has emerged
[13, 12, 45]. This field deals with the intersection between m athem atics and com
puter science and aims to develop be tte r ways to articulate, organize, dissem inate
and provide access to m athem atical knowledge. ASTER [8 6 ], O penM ath [23] and
M athM L [14] are im portant research projects in this field. P rior to the discussion of
the three approaches mentioned, a brief introduction to the notions of d a ta model and d a ta representation have been included. The reason for this is because I believe
they are fundam ental concepts for the definition of document specification structures.
An introduction to the strategy proposed by SGML [57] to structure docum ents is
also discussed. The end of this section addresses the principle of com positionality of meaning [98, 58, 101].
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
5
1.2.1 Data Model and Data Representation
According to [43] the concept of d a ta model in a database relates to the idea of hiding
d a ta storage details by means of d a ta abstractions. The structure provided by the
d a ta abstractions usually includes support for d a ta type definitions, da ta relationships
and constraints which the da ta should satisfy. A part from providing a d a ta structure for representing inform ation a d a ta model includes operations on the da ta structure
[90]. These operations are the means by which d a ta are accessed, retrieved and updated.
In addition to a set of operators, an efficient im plem entation of the da ta update
concept requires both the identification and control of redundant data. It also involves
the notions of equivalence, functional dependencies and normal forms. An example
of a d a ta model which addresses these issues is the relational d a ta model [32],
A da ta model is basically a d a ta encoding and a set of operators which m anipulate the data , whereas a da ta representation does not include the operators. A discussion
involving the differences between d a ta model and d a ta representation is provided by
[90]. The im portance of the notion of update in d a ta models may be expressed by the
relations between the notions of update and equivalence. As emphasized bv [90] an
efficient use of update should involve some mechanism to control redundancy which
requires the notion of equivalence.
1.2.2 SGML and XML
The S tandard Generalized M arkup Language (SGML) [57] is a docum ent represen
ta tion language which standardizes the application of generic coding and generalized
m arkup concepts. One of its im portan t characteristics is th a t it allows docum ents
to be trea ted in a way sim ilar to databases [90, 89]. As a m eta-language, SGML
defines a standard process for the specification of the syntax of descriptive m arkup
languages. This characteristic is based on the notion of docum ent representation schemes or Docum ent Type Definitions (DTDs) in SGML words.
It is by means of DTDs th a t SGML provides the necessary constructs to support the
representation of the logical structu re of docum ents. Three fundam ental concepts are
involved in this activity: entities, elements and attributes.
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
6
As stated in the International S tandard ISO 8879 [57] an SGML entity is defined as a
collection of characters th a t can be referenced as a unit. An entity has no structural
properties. Its application is restricted to the replacement of a string of characters
by an identifier.
S tructured docum ents are composed of a collection of components. These components
are characterized by their context, scope and type. The relationship a component
has with other com ponents is its context. The boundaries determ ining the beginning
and end of a com ponent define its scope. D ocum ent components may contain other
components or ju s t data. Consequently the type of a given component will either
be determ ined bv the da ta or by the com position of the types of the com ponents
which contribute to its definition. In SGML these components are represented by
elements. An SGML element may contain a ttribu tes. The purpose of the a ttribu tes is to describe some properties of the element.
SGML provides no operations for updating DTDs. It relies on editing for accomplish
ing any possible modification on any of its derived languages. Therefore it represents
descriptions of sta tic data. This characteristic is considered a lim itation when ap
plied to the representation of dynam ic d a ta sets. A lthough entities and the a ttrib u te
pair ID /ID R E F may be used as a way of elim inating redundant data , they cannot
be applied to control it since both are controlled by the au thor of the docum ent [90].
Also, as pointed out by [90] there is no system support to indicate w hether the use
of ID /ID R E F a ttribu tes refer to redundant information.
According to [6 8 ] the Extensible M arkup Language (XML) [20] is a simplified subset
of SGML th a t has capabilities for supporting its use over the Internet. Related to this
fact is a relevant distinction between XML and SGML. As pointed out by [6 8 , 62],
XML does not require a DTD to be delivered with its associated docum ent. Instead
it requires docum ents to be well-formed. This characteristic relates to the proper nesting of the s ta rt and end tags used for m arkup.
Validity constraints on the content of the instances not expressible through the XML's
DTDs are not effectively verified [17]. This is because XML's leaf nodes' structure
is usually either plain tex t or empty. This means rigorous type checking is not supported. Checking w hether the inform ation provided is either a date, a telephone
num ber or a ZIP code, for instance, is not supported.
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
1.2.3 XML and RELAX NG
XML Schema provides an alternative to DTD. It allows much more rigorous control
and supports d a ta types. In this thesis the RELAX NG [27, 28] schema language is
considered because it has been adopted by O penM ath [23] as the m ajor formalism for its encoding.
According to [90], RELAX NG is a d a ta model since it includes both support for d a ta
encoding and operations on the data. Most operations proposed by RELAX NG are
based on the operations used by DTDs to express d a ta constraints. Some of these
are, for example, choice, optional and zeroOrMore which correspond to |, ? and * D TD 's operators respectively.
Among the d a ta operations RELAX NG proposes, the replace definition mechanism
is not supported by XML DTDs. Its im plem entation involves the ref, include and
define operations. No specific operator is provided for this operation. Its semantics is
provided by an example [29]. The semantics of this operation is sim ilar to the context-
free gram m ar extension operation [36] I have proposed in 1998. The following example illustrates this operation:
<grammar><start><element name="addressBook">
<zero0rMore><element name="card"><ref name="cardContent"/>
</element></zero0rMore>
</element></start>
<define name="cardContent"><element name="name">
<text/></element><element name="email">
<text/>
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
8
</element></define>
</grammar>
Assuming the above syntax is available as the file addressBook.rng a define element,
containing the syntax to be replaced, is placed inside an include element. The syntax
th a t follows replaces the contents of the card element.
<grammar>cinclude href="addressBook.rng">
<define name="cardContent"><element name="name"><text/>
</element><element name="emailAddress"><text/>
</element></define>
</include></grammar>
As a result the previous gram m ar defined in the file addressBook.rng has the contents
of its card element replaced by the inform ation provided through the include element.
1.2.4 ASTER
Audio System For Readings (ASTER) [8 6 ] is an audio previewer for electronic doc
uments w ritten in the family of m arkup languages. A STER's processing en
vironm ent m aps the logical structu re of the T^X-based docum ent into its internal representation, a tree d a ta structure. Therefore browsing through a m athem atical
expression corresponds to visiting nodes of the tree. A representation of the docu
ment in audio form at is obtained by the application of a set of com mands w ritten in a language called AFL, which stands for Audio Form atting Language. One facility
this language provides is the possibility of variable substitu tion. This means an AFL rule may replace a portion of an expression by a label. This allows the user to obtain an overview of the expression prior to getting exposed to all its details.
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
9
1.2.5 OpenMath
Intended to become a m ajor standard to support the exchange of m athem atical infor
m ation, O penM ath concentrates on the dissem ination of scientific knowledge through
electronic means and on the d istribu ted processing of m athem atical inform ation [23].
By specifying the sem antic contents of m athem atical data, O penM ath aims a t the
inter-operability provision between the diverse systems capable of processing m athe
m atical inform ation [23].
The m ain focus of O penM ath is on the unambiguous communication of m athem ati
cal concepts [108]. This characteristic is achieved bv representing the m athem atical
concepts as O penM ath objects. These objects have the property of incorporating
both the semantics and structural inform ation of a m athem atical concept. A ttributes
may be attached to O penM ath objects and they can be applied to provide additional
inform ation not related to the semantics of the object such as typesetting details or
the URI of a given CD, for example.
O penM ath objects are structured as basic, compound and derived. Informally an
O penM ath object is viewed as a tree [23]. Basic objects are the leaf nodes of the tree. The non-leaf nodes of the tree are made up of its compound objects. This
choice of organization determ ines the LISP style O penM ath uses for the encoding
of its com pound objects. This means O penM ath builds expressions by using prefix
operators. O penM ath basic objects are integers, symbols, variables, floating-point
numbers, character strings, and bytearrays. Derived objects are non-O penM ath ob
jects th a t are im ported by means of the a ttribu tion construct. Com pound objects are created by the application, binding, a ttribu tion and error constructs.
The fact th a t O penM ath aims a t the com munication of m athem atics am ong com
puting systems is expressed by the way its objects are encoded. A binary and an
XML form of encoding are defined for its objects. A lthough the standard states tha t
the XML encoding is readable and w ritable by humans, [37, 108] claim the encod
ings provided are neither m eant to be read by humans nor to be created by editing
procedures where humans directly supply all the necessary syntax. Among the two
standard encodings available, the XML encoding is used to define the m eaning of the objects to be transm itted .
A pplication and binding are O penM ath constructors. An application constructs an
O penM ath object from a sequence of one or more O penM ath objects. The following
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
10
XML encoding illustrates the use of the application object to capture the semantics
of the variable x , + 1 [23].
<0MV name= "x"><0MA><0MS csbase="http://www.openmath.org/cd"
cd="arithl" name="plus"/><0MV name="i"/><0MI>1</0MI>
</0MA></0MV>
A binding is composed of three objects, a binder which is the first, followed by an
optional set of argum ents which are variables to be bound followed by a body. The
following example is taken from the a r ith l CD [23] which captures the m eaning of
the m athem atical expression Y}x=\^/x by means of the binding object.
<0M0BJ><0MA>
<0MS><0MA><0MS cd="interfall" name="integer_interval"/><0MI> 1 </0MI><0MI> 10 </0MI>
</0MA><0MBIND>
<0MS cd="fnsl" name="lambda"/><0MBVAR><0MV name="x"/>
</0MBVAR><0MA><0MS cd="arithl" name="divide"/><0MI> 1 </0MI><0MV name="x"/>
</0MA></0MBIND>
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
11
</OMA></OMOBJ>
An a ttribu tion decorates an object with a sequence of one or more pairs composed
of an O penM ath symbol, the a ttrib u te , and an associated object, the value of the
a ttribu te . According to [23] a ttribu tion may either be used as an adornm ent or as
sem antical annotations depending on the role associated with the a ttribu te . The
standard states th a t when the a ttr ib u te has role sem antic-attribution the a ttribu ted
object is modified by the attribu tion . For this reason a ttribu tion is also considered a
constructor. A lthough this characteristic is referred to as an im portan t feature, the
a ttribu tion examples included in the standard only involve adornm ent annotations.
The following code illustrates both the use of the attribu tion object by associating non-O penM ath da ta with an O penM ath object by the use of the foreign element.
<0MATTR><0MATP><0MS cd="presentation" name="mathml"/><0MF0REIGN>
<math xmlns="http://www.w3.org/1998/Math/MathML"><mi> sin </mi><mfenced><mi> x </mi></mfenced>
</math></0MF0REIGN>
</0MATP><0MA>
<0MS cdbase="http://www.openmath.org/cd" cd="transcl" name="sin"/>
<0MV name="x"/></0MA>
</0MATTR>
The error object is not considered because it has no direct m athem atical meaning. Its use is to report problems related to the communication of O penM ath objects.
The O penM ath structure used for grouping O penM ath objects is a Content Dic
tionary or CD for short. The definition of a CD usually includes other CDs. An exception to this is the M ETA-CD which contains the definition of the structu re of
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
12
a CD. CDs may be grouped as a mechanism to define collections or groups and both CD and CD groups are XML docum ents.
The d a ta provided by a CD may be structured according to the type of inform ation
th a t is addressed. Inform ation included in a CD either
• belongs to the whole CD 0 1
• is about the m athem atical concepts there represented.
Represented bv the element OMS , an O penM ath symbol is the mechanism the s tan
dard uses to refer to symbols from a Content Dictionary. It is by means of its three
attribu tes, cd, name and cdbase th a t the element OM S determ ines where the sem an
tics of a nam e is defined. A restriction regarding the location a t which a symbol may
appear in an O penM ath object is provided by a characteristic called the role of the symbol.
Inform ation related to the definition of an O penM ath symbol is organized as m anda
tory and optional data. The nam e and the description of the symbol are m andatory.
O ptional inform ation includes examples, formal m athem atical properties (FM P),
commented m athem atical properties (CPM ) and the role.
The optional characteristic of FM Ps indicate th a t there exists no consistent way of
expressing the semantics of m athem atical concepts. The definition of the sum object
as provided in the a r ith l CD is presented by means of a tex t description followed by
an example. Even when formal properties are provided it is difficult to determ ine the
set of properties th a t best characterize a concept.
A lthough the role is one of the fields of inform ation th a t defines an O penM ath Symbol
its definition is provided as a CD element. It is not clear from the description provided
by the standard the reason why a symbol characteristic is defined in a CD.
O penM ath extensibility is based on the notion of CDs. This means for each m ath em atical concept not supported by the standard , a CD must be provided with the
definition of the concept structured according to the O penM ath objects. A lthough
the latest version of the standard [23] relies on RELAX NG's mechanisms to support the au tom atic generation of CDs, the definition of the O penM ath objects included in
the CD depend on the same editing tools used for the m anipulation of text . For this
reason CDs are sta tic descriptions of data. O penM ath resolves ambiguous definitions by means of the cdbase a ttrib u te of OMS.
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
13
1.2.6 MathML
The M athem atical M arkup Language or M athM L [14] is a World W ide Consortium
(W3C) recom m endation for describing m athem atical notation. M athM L is an XML
application which focuses on the provision of m athem atics on the World W ide Web.
M athM L approaches the m arkup of m athem atical concepts by means of two sets of elements and a ttribu tes. It is bv means of this property th a t M athM L encodes the
layout as well as the semantics of m athem atical expressions. P resentation M athM L
and Content M athM L are two languages provided to support this characteristic.
In much the same way T£X approaches the typesetting of m athem atical text, pre
sentation M athM L determ ines the control over the display of m athem atics. Content M athM L is m eant to supply more m eaning to the description of m athem atical con
cepts. One restriction this form of m arkup provides is the lim ited range of m athe
m atical concepts it covers. This is because content M athM L has been designed to
support the encoding of m athem atical concepts th a t are used from kindergarten to
the end of high school and the first two years of college. Like O penM ath, M athM L
also shares the characteristic of being a system -oriented approach. This property has
been emphasized by [79]:
while M athM L is hum an-readable, it is anticipated th a t, in all but the
simples cases, authors will use equation editors, conversion programs, and
other specialized software tools to generate MathML.
Content M athM L consists of about 120 elements accepting about a dozen a ttribu tes.
The representation of concepts not covered by these elements may be obtained by
referring to external definitions. The M athM L csymbol element or content symbol
is provided to address this lim itation. This element is the constructor M athM L
has to refer to a symbol the m eaning of which is not provided by M athM L's core
content elements. It is by its two a ttribu tes definitionURL and encoding tha t
csymbol determ ines the characteristics of the external element. The def initionURL a ttrib u te specifies the Uniform Resource Identifier (URI) th a t provides the definition for the new symbol. The encoding a ttrib u te determines the syntax of the target
th a t has been referred to by the def initionURL attribu te . The content of a csymbol is either PCDATA or a presentation construct. The following example illustrates the
characteristics of this form of extension:
R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.
14
Ccsymbol d e f in i t io n U R L = " www. e x a m p le . c o m /C o n tD if fF u n c s . htm "
e n c o n d in g = " t e x t ">
<msup>
<mi> C </m i>
<mn> 2 </mn>
</msup>
< /csy m b o l>
The above definition encodes a symbol th a t semantically represents the space of
twice-differentiable continuous functions and has its syntax encoded as C 2.
1.2.7 Some Limitations of Both OpenMath and MathML
1. M athem atical expressions in both O penM ath and M athM L are built by using
prefix operators. Therefore the order of entry is counter-intuitive [96] since the
mental model imposed by both approaches determ ine th a t user inputs notation
from the inner most nested expression outward, instead of from left to right.
2. Although both standards support m ultim odality of output, they have not been
designed to support m ultim odalitv of input. This is because their structure involves complex syntax.
3. Both standards are system -oriented. Consequently their constructs are not easily readable and w ritable by humans.
1.2.8 Compositionality
Regardless of the notation used to express the m eaning of a m athem atical concept one
property which needs to be considered is the sem antic structure of the concept. Sem antic structu re denotes the parts which comprise the concept, their ordering, group
ing and relations am ong these parts. One challenge introduced by this characteristic
is to ensure the correctness of a chosen sem antic structure for the representation of a m athem atical concept.
The principle of compositionality of meaning has been proposed bv [98] as a require
ment to be considered to the design of knowledge representation languages. This
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
15
concept is covered in detail in a chapter titled Compositionality [58] in the Handbook
of Logic and Language [101]. The key idea of the com positionality principle is tha t the m eaning of a sentence can be composed from the meaning of its parts. In a more
precise form this principle is sta ted as
The m eaning of a compound expression is a function of the m eaning of
its parts and the syntactic rule bv which they are combined [58].
A language is considered com positional if it satisfies the com positionality principle.
This involves the decision on w hat are the basic semantic and syntactical compo
nents and how they are combined [58]. Therefore a design th a t is not com positional
indicates th a t its parts a n d /o r the syntactic rules which bind them have not been
selected properly. A lthough achieving com positionality of m eaning might seem to be an impossible task, [58] claims th a t
. . . com positionality becomes possible if sem antic considerations influence
the design of the syntactic rules.
The above indicates th a t one can always find a syntax th a t allows the assignment of
the intended m eaning in a com positional form. This property is supported by Theo
rem 9.4 in [58] which claims th a t any possible m eaning can be assigned to any possible
language in a compositional form. For languages characterized by a fixed (static) syn
tax com positionality of m eaning is a design decision since it can be achieved by the
choice of a suitable gram m ar. Theorem 9.3 in [58] supports this characteristic. It
proves th a t if a language can be generated by any algorithm it is possible to gen
erate this language by a com positional gram m ar. According to [98] O penM ath is compositional and M athM L is not.
1.3 M otivation
The work of this thesis was originally m otivated by the necessity of having a T^X- to-Braille translation system [10, 11]. As characterized in [10, 11, 44], both T^X and standard Braille representations emphasize the syntactical structure of the concepts
involved. For this reason a sem antics-preserving translation from T^X input to Braille
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
16
was not achieved. T he recom m endations provided by [10, 1 1 ] regarding this transla
tion included the necessity of a semantics-based m arkup. This has, of course, been
noted by many others in the field [8 6 , 14, 23].
The autom atic translation of T^X input into standard Braille ou tpu t was approachedby A rrabito [10]. The impossibility of this translation was reported as a consequence
of the sem antic am biguities of some frequently used m athem atical constructs, in the
T rjX definition, and the lack of m eta-rules in the Braille standard to cope with the
macro expansion characteristic of T^X.
The experience reported by A STER and bv A rrabito’s experim ent provided some
valuable insight into the rendering of m athem atics. Since the two approaches were
based on input provided from T^X files, they both had to deal with all the conse
quences a tex t form atter could impose when used as a source for representation of
m athem atical semantics. A STER assumes all its source input are well w ritten 3 lAI^X
docum ents. This implies th a t any macro definition, including the ones provided by
the author, must reflect the logical structu re of the concepts involved in the definition.
A nother in terpretation of this requirem ent is th a t a restriction is necessary in order
to lim it the excess of power provided by T^X to the user.
The fact th a t lAI^X is characterized by a procedural m arkup 4 approach, obtained
by means of macro calls, can be viewed as both an advantage and also a constraint.
Macro definitions provide the ability to support the natural instability of the conven
tional m athem atical notation. On the other hand, the same macro definitions pose a m ajor difficulty to the processing environm ent with respect to their use. If expanded,
the sem antic contents they provide are lost. If not defined properly, they may not
carry the needed semantics.
As tex t form atters, systems based on I^jX were designed around the necessity of
having a structured d a ta representation. The main m otivation for this approach is
th a t a standardization of representation paves the grounds for its interchange. By-
preserving the way inform ation is represented, the possibility of having to re-process
d a ta whenever a new system was introduced or as the result of an upgrade in the
current system is no longer a concern.
3 ASTER's structure is based on the assumption that, distinct mathematical concepts that share the same syntactic encoding must be described by distinct macro definitions.
4 Procedural markup consists of commands that determine how text should be formatted [34],
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
17
A lthough docum ent structures based on standardization of representation favors doc
ument portability, they are not adequate for rendering docum ents in ways th a t require different hum an senses for the understanding of inform ation. This has been observed
by both Ram an [8 6 ] and A rrabito [1 0 ] while working on m apping RTgX into speech
and T^X into Braille respectively. The necessity of having a docum ent structure tha t would allow the m athem atical concepts be com municated regardless of media used or
the hum an senses involved, m otivated the research reported in this thesis. The section
th a t follows outlines a semantics-based solution to the specification of m athem atical concepts.
1.4 A Solution: Dynamical Document Structure
I propose th a t the meaning of m athem atical concepts can be captured in a user-
oriented 5 way by means of an appropriate gram m ar formalism which satisfies the following criteria. The gram m ar formalism must
1 . model the dynam ics of authoring m athem atics,
2 . describe the structure of the rules bv which syntax is created,
3. provide operations on the rules th a t define syntax and
4. support the definition of syntax by the application of the operations on these
rules.
In my thesis I introduce a text-based docum ent structure (Document Description
Model) which satisfies the above four criteria, and is therefore capable of capturing
the semantics of m athem atical concepts in a user-oriented way. The proposed model has the following characteristics:
1 . it supports both the extensibility and am biguity characteristics of the conven
tional m athem atical notation and
2 . it allows the au thor of a docum ent the possibility of introducing h is/her own syntax for the encoding of the m athem atical concepts.
5 In the context of this work user-oriented refers to a design approach focused on the needs of the end user.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
18
I claim the following:
• The m eaning of m athem atical concepts can be captured by a ttribu ted context-
free gram m ars.
• Extensibility can be supported by operations on the a ttribu ted gramm ars.
• Ambiguity generated by symbol overloading can be resolved by a scope mech
anism where the m eaning of concepts is uniquely defined.
1.5 Approach Taken
This thesis presents a gram m ar-based approach to the specification of m athem atical
notation. The m ethod introduced is based on a m eta-structure th a t uses a ttribu ted
context-free gram m ars for capturing the m eaning of m athem atical concepts. This
structu re supports the creation of m ulti-purpose docum ents and allows the specifica
tion of m athem atical notation in a dynam ical way [36]. In the context of this thesis,
the term m ulti-purpose docum ents refer to docum ents th a t may be rendered or used
in different ways, some of which might not be known at the tim e the docum ent is
created. By dynam ical it is understood th a t the meaning associated with syntax is
allowed to be modified.
The proposal described in this thesis is based on an authoring model which addresses
the user needs as a fundam ental requirement. This characteristic is structured around a scope mechanism th a t allows the m apping between m eaning and syntax to be
modified a t any tim e during the creation of the docum ent. This process supports
the dynam ic characteristics of the m eaning-to-syntax binding necessary during the
authoring of m athem atical concepts. The m ulti-purpose property is supported by a
semantics-based capturing mechanism [49, 11, 21] th a t provides the possibility for
the represented concepts to be processed according to the specific requirem ents of
applications. M odular gram m ar fragments characterized by a one-to-one mapping
between m athem atical concept and gram m ar representation provide the adequate support for the definition of the various scopes. An incremental update process is defined as a way to modify the necessary gram m ar fragments to support the changes
proposed during the authoring process.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
19
1.6 Thesis Overview
In order to provide document authors with the freedom of com m unicating m athem at
ical concepts bv means of the syntax th a t, according to the authors, best represent
the concepts involved, adequate docum ent structures need to be available. This thesis
addresses this problem by introducing a system atic approach th a t allows an au thor
to capture the m eaning of each m athem atical concept according to the syntax he/she
feels best describes it.
The approach presented here is based on a m eta-structure which has been designed
with the support of a ttribu ted context-free gram m ars. C hapter 2 introduces the
reader to the notation and the fundam ental definitions. Some of the definitions pro
vided may be found in books covering the theory of com putation, however they have
been included to make the thesis notationally self-contained.
A framework for interactive systems is proposed in C hapter 3. The proposed frame
work is based on the model developed by Abowd [4] and introduces an additional
translation in order to support the consultation of the system 's s ta te by the user.
The framework is refined in C hapter 4 by the decomposition of its core component
into two subcom ponents, the O perating System and the Document Structure. This organization is also used in th a t chapter to support the claim th a t docum ent au thor
ing is an interactive activity th a t requires an environment for its fulfillment. Defined
as a pair (Document S tructure, User Interface) the notion of A uthoring Environment
separates user interface com ponents from the structure of the docum ent. C hapter
4 also provides the basic concepts needed for the definition of requirements for the
evaluation of authoring environments. For this purpose a set of properties is provided.
In C hapter 5 the possibility of capturing the m eaning of m athem atical concepts by means of context-free gram m ar fragments is introduced. This possibility illustrates th a t although these gram m ars can be used for the capturing activity, they do not
provide the necessary support for both extensibility and am biguity characteristics of
the conventional m athem atical notation. The m ajor lim itation with this approach is
because context-free gram m ars only support sta tic descriptions of semantics. This
restriction is addressed in C hapter 6 where the dynamics of docum ent authoring is
considered. The approach developed in th a t chapter proposes the docum ent structure component for com puter-based authoring environments. This structure is composed of two components: a sequence of sets of gram m ars called Semantic S tructure and a
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
•20
gram m ar called the Binding Control mechanism. The semantic structure is based on
a ttribu ted context-free gram m ars and it addresses extensibility bv combining gram
m ar definitions. Two gram m ar operations are defined for this purpose. These opera
tions assume the gram m ars involved have been defined according to the restrictions
specified by a normal form proposed in the chapter. The am biguity characteristic is approached by a context switch which allows the replacement of a sem antic structure
by another. C hapter 7 provides a set of examples. These examples are used to il
lu stra te the characteristics of the approach introduced in C hapter 6 . A structu re for
processing the docum ent organization presented in C hapter 6 is proposed in C hap
te r 8 . The language processing model introduced is defined as a determ inistic finite
autom aton th a t has its states characterized as sets of gram m ars and its transitions
by the m eaning-to-syntax bindings established during authoring. C hapter 9 contains a discussion of the approach proposed by this thesis, conclusions and suggestions for future work.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
21
Chapter 2
Basic Notions and N otation
This chapter presents the notation to be used throughout this thesis and includes
the necessary basic definitions. The specification of gram m ars may be approached
by listing their production rules whenever a complete specification is not necessary.
All gram m ars in this thesis will be displayed in table form. The gram m ar's name
will always appear in the far left column and each row of the table will contain a
production rule w ritten with spaces as symbol delimiters. Both nonterm inal and
term inal symbols are represented by strings of characters, possibly linked by the underscore character. Lower case strings of letters are used to represent nonterm inals
and upper case letters and other characters are used to represent term inal sym bols1.
The symbol | is sometimes applied to group together rules associated with the same
nonterm inal. The nonterm inal on the left of the production rule in the first row is
the s ta rt symbol. The arrow —► is replaced by a colon in all gram m ars except the one
for the m eta-structure. For a ttribu ted gram m ars, one additional column is included
a t the right edge of the table to represent the a ttribu tes associated with the rules. Strings of a rb itra ry characters are used to represent attributes.
2.1 Basic Definitions
The main definitions are here included in order to establish the notation th a t is used
throughout the thesis. For further inform ation see [55] as a standard reference.
1 The choice of representation for both terminals and nonterminals is consistent with the approach used by compiler tools such as lex and yacc.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
22
An alphabet is a finite non-empty set. Elements of an alphabet are called symbols.
Let A' be an alphabet. Then A'* is the set of all words over A' including the empty
word e.
Definition 1 A context-free gramm ar (CFG) is denoted G = (AT, T, P, S ) where N is an alphabet of nonterm inal symbols, T is an alphabet of term inal symbols such tha t
N (IT = 0, P is a finite set of (production) rules of the form A —> w with A E N and
w E (N U T)*, and S E N is the s ta rt symbol.
Let G = (N , T, P, S ) be a context-free gram m ar, let V = N U T, and let u, v G V*.
The word v is derived from u in one step, if there is a rule A —► w G P and there are
words U\,U2 G V* such th a t u — u \A u 2 and v = U\WU2 - The fact th a t v is derived
from u in one step is denoted by u => v. We write u =>* v to denote the fact th a t there is a non-negative integer n and there are words u0, Ui , . . . , un G V'* such th a t u = u0,
v = un, and Uj_i => u* f°r i = L • • • , n - ' n this case we say th a t v is derived from u,
the integer n is the number of derivation steps, and the sequence uq, «i, . . . , un G V'*
is a derivation of v from u. The set
L(G ) = {u | u G T* and S =>* u}
is the language generated by G.
D e fin it io n 2 Let G = ( N ,T , P, S) be a context-free gram m ar. For all rules A —>
w E P , A E N , w E ( N U T)*, A is called the left (hand) side, or lhs, of the rule,
and w is the right (hand) side, or rhs of the rule. For p = A —> w, lhsp = A and
rhsp = w. The set of nonterm inal symbols of p is Np = Lp U Rp where Lp = {lhsp}
and Rp = { M \ M E N and rhsp = W \M w 2 ,Wi and w 2 E V*}. The set of term inal
symbols of p is Op — {x | x E T and rhsp = W \xw2, w i and w 2 G V*}.
D e fin it io n 3 An attributed grammar is a sextuple G = (N, T, P, S, A , a ) w ith the following properties:
• The quadruple G = { N, T , P, S) is a context-free gram m ar, the underlying gram
mar.
• A is a language over some finite a lphabet, the attribute language.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
23
• a is a m apping of P into A , the attribute assignment.
Any word in A is called an attribute. For a rule p € P , the word a(p) is the attribute
of p.
D e fin it io n 4 A determ inistic finite autom aton (DFA), M, is a quintuple, (Q , E, 6, s, F),
whose
• Q is an alphabet of s ta te symbols,
• E is an alphabet of input symbols,
• s € Q, where s is the s ta rt state,
• F C Q , where F is the set of accepting states, and
• S : Q x E —> Q is the transition function
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
24
Chapter 3
A Framework for Interactive System s
In this chapter a framework for interactive systems is proposed. The framework
introduced here is based on the model defined by Abowd [4]. It differs from his
approach by the introduction of an additional translation which connects the user
and the ou tpu t component of the system 's interface.
3.1 Basic Notions
Com puter-based systems have been designed to support a wide variety of human
activities. Hum an com munication is one field wrhich has been expanding through
support from com puter technology. In this section some aspects of hum an-com puter
com m unication are discussed.
3.1.1 Electronic and Paper Documents
It seems the s ta tic world of paper docum ents is gradually being replaced bv the
dynam ic environm ent of digital inform ation. In the electronic form, docum ents need to be structured in order to be processed by com puting systems.
A key element of electronic docum ent processing is the possibility of easy m anipulation of a docum ent's atom ic elements by means of digital devices. This idea intro
duced the necessity to view docum ents not only as printed output generated by a
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
25
digital machine, but also introduced the need to store docum ents in a way to provide
full portab ility to other com puter environm ents easily. This means the structu re of docum ents needed to be preserved.
This way of viewing docum ents suggests they are composed of a logical structure, a
set of abstrac t com ponents, and contents where the actual contents of the docum ents can be found. The logical structu ring of docum ents is based on the decomposition
of docum ents into parts. Each part in the structure has a particu lar m eaning and
may, recursively, be subdivided into other parts. In this way the whole docum ent
can be represented as a collection of hierarchically-related com ponents. An abstract
com ponent, a given paragraph of a docum ent, for example, may be expressed over
one or more two-dimensional page space, in various different ways, depending on
specifications of font, hyphenation, line length and other concrete variables. The
same logical component may be m apped into different concrete variables and then
made available in different m edia by means of a tactile display, a Braille prin ter
or audio, for instance. In this thesis the process of translating abstract docum ent
com ponents into concrete ones is defined as rendering. The production of hardcopy,
images, speech or any other possible presentation structures from concrete document
components to ou tpu t devices are defined as viewing.
According to Levy [67] docum ents have been created in response to a hum an necessity
to provide stabilities in a constantly changing world. The notion of fixing the form of
a docum ent as a means of fixing its contents is viewed as a property docum ents have which he defines as invariance.
It is intuitive to relate this notion of invariance to paper docum ents since they are the result of a process by which surfaces of paper sheets are usually marked in a stable
way. On the o ther hand electronic docum ents usually require rendering in order to
be m anipulated by humans. The fact th a t one given abstract docum ent component
may be m apped into different concrete ones indicates the existence of a one-to-manv relationship between them. This relationship is an im portant property of electronic
docum ents because it allows various m edia to be used to deliver the inform ation
provided by the abstract docum ent com ponent. The idea of using different media to com m unicate is discussed in the subsection which follows.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
26
3.1.2 Communication, Media and Modalities
Inform ation is shared among humans bv a communication process. This process may
always be described in term s of three fundam ental components: a sender, a receiver
and a com m unication channel or medium. Information carriers such as com puter
input and ou tpu t devices and the physical carriers such as sound waves and photon
distributions are media. Therefore medium is the physical channel used for inform a
tion encoding. Sensory m odality is a hum an mechanism of perception where vision,
hearing, touch, smell, taste, and balance are used for the processing of incoming
inform ation. Representation m odality is the way inform ation is encoded in some medium.
Com m unication through a given set of m odalities is only possible when provided
by adequate inform ation carriers. The following scenario illustrates this relation:
Consider, for instance, the directions given by one person to another to find a place
in a city. The necessary directions may, for example, be given by voice in combination
with gestures. In this case the sensory m odalities used are hearing and vision. The
sound waves and photon distributions are inform ation carriers. Both the spoken
language and the set of gestures are representation modalities.
Sensory modalities are physical characteristics of the human body, therefore their
num ber is fixed. On the other hand the num ber of inform ation carriers varies. In
the scenario illustrated by the above example the inform ation carriers were chosen
to characterize a face-to-face or hum an-to-hum an communication activity. In this
thesis, this form of inform ation exchange is characterized by the absence of com puter- based systems and by the fact th a t both sender and receiver are hum ans sharing
place and time. Humans also exchange inform ation with the aid of com puter-based systems. This form of inform ation exchange is referred to here as com puter-assisted
com munication. The concept of com puter-assisted communication is, in this thesis,
used in a broad sense. Its meaning includes the notion of both hum an-com puter
interaction and com puter-m ediated hum an-to-hum an interaction. Also in the context of this work, interaction is used to refer to the communication between user and system.
Humans usually make use of available media to com m unicate ideas and feelings. A lthough the increase of inform ation carriers does not necessarily improve the com
munication it is, most of the time, expected th a t the inform ation to be shared is
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2 7
available to the receivers through all possible modalities. According to Bunt [22],
people com m unicate with each other, most of the time, according to what he calls
the Multimax Principle. He defines this principle as follows:
In natura l com munication, the participants use all the modalities and
m edia th a t are available in the com municative situation.
The m ultim ax characteristic is present even in situations where one of the parties
involved by the com m unication is not capable of com m unicating in all the m odalities
by which inform ation is made available. As an example of this consider, for instance,
the face-to-face communication between sighted and blind people. If it is assumed the
sighted person com municates with the blind using voice and gestures, for example, it
is clear th a t the inform ation provided by a set of gestures will not be processed bv
the blind. A lthough it is known th a t the exchange of inform ation with blind people
is not improved when gestures are used, sighted people do not usually avoid this
representation m odality when com m unicating w ith blind people.
It is intuitive to think about com puter-assisted communication in term s of face-to-
face com m unication having all available media and modalities as characterized by the
m ultim ax property. One challenge to this approach is the definition of adequate struc
tures for both software and hardw are to support this characteristic. The rem ainder
of this chapter discusses some aspects of the software needed in term s of a framework
to support the interaction between the user and the computer.
3.2 User Interface Basic Components
User interface design for com puter applications is an interactive process where sets
of objects are m anipulated. These objects can be structured according to the role
they play in the interaction. They can be of input, output or both input and output
types. They may also be of direct use in case the physical object is m anipulated, or
they can be of indirect access if no physical interaction is perm itted.
The com ponent which connects input and ou tpu t objects is generally referred to as a system. Therefore the user accesses the system by m anipulating the interface
objects. Systems differ by their intrinsic characteristics. These qualities are viewed
as statem ents of a language which can be used to represent the system. This will be
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
28
referred to as the core language. Users can be described in term s of psychological and
physical characteristics relevant to the com m unication with the system. Users have goals which may be realized by the system. These goals are structured as activities
which the user may realize by com m unicating with the system. These properties may
also be expressed as language statem ents which we call the task language.
The system ’s s ta te is reported in forms defined bv the output objects. The a ttribu tes which establish the way the sta te of the system is rendered characterize the language
used by ou tpu t objects to com municate. In a sim ilar way, user requests are sent to the
system by configuring input a ttribu tes according to the required behavior defined by
the task to be performed. The a ttribu tes involved in these type of requests represent
the features of the language the user has to use to interact with the system.
SYSTEMcore
OUTPUT
task
Figure 3.1: Gregory Abowd’s framework for interactive systems.
3.3 An Existing Model
The interaction framework proposed by Abowd [4] describes the com munication between user and com puter by a model composed of four com ponents and four translations. The com ponents represent the stages the interaction goes through. Each component has its own language by which its internal characteristics are defined. The translations are used to m ap knowledge between the com ponents. Figure 3.1
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
29
illustrates this framework. In this figure com ponents are represented as nodes and
translations are the arrows linking the nodes. Component names are typeset in upper
case letters and both the names for the languages and translations are in lower case.
The languages are task , input, core and output.
As shown in Figure 3.1 articulation connects the USER to INPUT. Therefore it is used to represent the user’s intentions in term s of the structure provided for d a ta entry
by the system. Performance is responsible for the translation of inform ation collected
during the input stage into core data . The s ta te of the system is m ade available to
ou tpu t devices by presentation. Observation is the user's ability to perceive the sta te of the system.
3.3.1 A Structuring Problem
It is intuitive to decompose the interaction between user and com puter in term s of
execution and evaluation semicycles [39]. During this process the user's intentions,
represented as statem ents of the task language, are m apped as input com mands which,
after execution by the system, are observed and evaluated by the user. If the user's
intentions cannot be completed in a single cycle of interaction, other related cycles are
introduced. The additional cycles are viewed as refinements of the intended task to be
realized. The framework proposed by [4] relates articulation and performance to the execution semicycle and presentation and observation as elements of the evaluation
semicycle. As defined by this approach, the interactive cycle begins with the USER
by the form ulation of a goal, and a task to accomplish the goal. This approach is also
based on the assum ption th a t the only way the user can m anipulate the machine is
through the INPUT. For this reason, the task m ust be articulated within the input
language. A lthough Abowd’s framework assumes th a t execution and evaluation are not always alternating semicvcles, the model does not indicate the procedure to be
followed when the user's goals first require the knowledge of the system s’s s ta te as provided by the ou tpu t devices1. As illustrated in Figure 3.1 Abowd’s framework
establishes th a t the evaluation semicycle always precedes the execution semicycle.
Therefore following the path as defined by the arrows connecting the USER and the
O U T P U T com ponents, articulation , performance and presentation are identified as
1A typical scenario for this is a user interfacing with a display-based system which first prompts the user for input.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3 0
nonactive translations for activities when input devices are not involved.
The notion th a t the interaction cycle must s ta rt with the user bv the formulation
of a goal and a task is accepted in this thesis. However the user is free to either
m anipulate the system by means of its input devices or consult the system 's s ta te as
supplied by the output. The following section proposes an additional translation to
Abowd’s framework as a way of approaching this nondeterm inistic behavior.
3.4 A Different Structure for Interactive Systems
This section introduces an additional translation to the framework proposed in [4],
The inform ation made available to the user by the system 's ou tpu t devices is now
structured as a process composed of two phases, consultation and observation. By
consulting the ou tpu t provided by the system, the user obtains the available require
ments to continue h is/her activity. These requirements are viewed as conditions from
the system 's perspective and as possible modifications to the task to be performed
from the user’s point of view. The modifications may be as simple as the addition
of an ex tra interaction cycle or as complex as requiring the complete task to be re
structured. As an example of this, consider the scenario where a client of a bank tries
to withdraw cash from h is/her account by means of an autom atic teller machine. If
the system is in a sta te which displays an out of order message, the client has to
modify h is/her goal/task pair because h is/her intentions could not be expressed by
the system 's interface at th a t particu lar instance. Consultation is therefore viewed
as a translation which maps the user’s expectations to the system ’s s ta te as supplied
by the ou tpu t devices.
3.4.1 A New Framework
A new framework for interactive systems based on the work developed in [4] is intro
duced here. The proposed framework differs from the model of [4] by the introduction
of an additional translation which supports the consultation of the system ’s sta te by
the user through the ou tpu t devices.
The notion of interactive cycles is understood as sequences of com ponents connected by translations. The sequences represent the derivation of words of a language defined
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
.31
by all possible tasks which can be realized through the system bv means of the
interface. The results obtained by the derivation procedure represent the user's tasks
th a t have been completely realized by the resources available. These characteristics are represented by the right-linear gram m ar
G = (N , T, P, B)
with
N = { U , I , S , 0 }
T = { c , a , p , v , o }
P = { U cO \ a l \ e, I pS , S vO, O -> oU }
and
B = U
where U , I, S and O are short forms for USER, INPUT, S Y S T E M and O U T P U T
respectively, and c, a, p, v and o are representations for consultation, articulation,
performance, presentation and observation respectively.
G ram m ar G is nondeterm inistic. This characteristic relates to the need the user may
have to analyze the ou tpu t in order to decide the next action to be taken. During the analysis process the user may refine or even redefine the m ental model he/she
has developed. A lthough regular languages can be graphically represented by the
standard sta te transition diagrams, sta techarts [52, 53] will be used. The reason for
this choice is due to the fact th a t hierarchical structures are be tte r visualized when
represented by these diagrams. The dynam ics of the proposed model is captured
by the sta techart in Figure 3.2. The statechart in this figure has depth two since
it structures the states in two layers or levels of abstraction. The higher level has
SYSTEM , IN TE RF ACE and USER as states. The lowrer level is a refinement of the
IN TE R F A C E s ta te and is composed of only two states, O U T P U T and I N P U T As it
can be seen lower case letters have been used to typeset both the names for languages
and translations. Each language has been placed inside the box where its related s ta te nam e is located.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
32
FRAMEWORK
SYSTEM USER
t s k
• “ — crtk aU
inpa t
Figure 3.2: The Proposed Framework for Interactive Systems.
3.5 Example
Consider the scenario where a client of a bank fails to w ithdraw cash from an Auto
m atic Teller M achine (ATM) because he/she has forgotten the required bank card.
The client/A TM interaction, for this case, may be described by the following tasks:
• Consult s ta te of ATM by reading inform ation provided by its display, and
• In terpret inform ation from display.
It is during the Interpret information from display task th a t the client realizes the
adequate bank card m ust be supplied. Not having the needed card, the client stops
the cash w ithdraw activity and consequently the client/A TM interaction term inates.
This activity may be expressed by the framework proposed in this chapter bv the regular expression (co)*. The transitive closure is used in this case to indicate the
client's necessity to cycle through consultation/observation zero or as many times as
he/she feels it is necessary.
The regular expression ((co) + (apvo ))* represents all possible interactions the user
may have with the system. The term (apvo) represents interactions th a t involve both execution and evaluation semicycles. This characteristic is present in both the
framework proposed here and in Abowd's framework. The term (co) involves only interactions th a t include the evaluation semicycle. This characteristic is not included in the Abowd's framework.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3 3
3.6 Summary
A framework for interactive systems which is based on the model defined by Abowd [4]
is introduced in this chapter. The proposed approach uses an additional translation
as a way to support the necessary user analysis of the system 's sta te as supplied by the
ou tpu t devices. The complete cycle of interaction is modeled as a regular language.
A graphical representation of this organization is provided in a sta techart format.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3 4
Chapter 4
Authoring Environments
4.1 Introduction
The purpose of this chapter is to provide an understanding of the docum ent authoring process and to establish a context for the discussion of the quality of environm ents
used in the authoring of docum ents containing m athem atics. For this reason a set of
characteristics is considered in order to assess the quality of the environments. An
ideal environm ent is proposed and design approaches which may be used in order to achieve them are presented.
4.2 Interaction Objects and Authoring Environments
This thesis considers com puter-based docum ent authoring as an interactive process.
During this process the au thor m anipulates docum ents by means of interaction objects
as defined in C hapter 3. These objects can be m anipulated directly or indirectly by
the user. A docum ent authoring environm ent is a combination of interaction objects
and is structured according to the form of control the au thor has over the interaction objects involved.
Consider a pen/paper docum ent authoring environm ent for instance. In this organi
zation, the au thor uses the pen to record inform ation on the paper. This environment is characterized by the fact th a t all objects involved are directly m anipulated by the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3 5
author. The interaction is completely under the au thor's control because all infor
m ation printed on paper results from direct actions performed by the au thor on the
interface objects. To illustrate the notion of a docum ent authoring environm ent con
sider, for instance, a docum ent such as a research report w ritten in English. Table
4.1 provides a description of the pen/paper environment according to the interaction
framework proposed in C hapter 3.
USERtask
A uthorProduce a handw ritten draft of a research report in English
articulation Hand movements associated with handw ritingIN PU Tinput
pen /paper pen strokes
performance cursive writing
SYSTEMcore
P en /p ap er tex t authoring W ritten text
presentation Rendering of cursive w ritten text on paperO U T PU Toutput
PaperSets of handw ritten cursive characters printed on paper
observation Sets of handw ritten tex t according to the form atting style defined
consultation Interpretation of the cursive sets of characters based on the English syntactical and sem antic definitions
Table 4.1: P en /p ap er authoring environment.
A lthough in com puter-based authoring environm ents the au thor directly interacts with physical objects such as keyboard and mouse, the expected result, when avail
able, is also dependent on objects of an indirect form of control. Software and hard
ware com ponents not available for m anipulation by the system 's users are considered here as indirect objects. Table 4.2 presents the description of a T^jX-based environ
ment for the research report authoring task.
Document authoring environm ents which make use of no object of indirect interaction form of control are referred to here as direct or ideal environments. All other environm ents are considered indirect.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3 6
USER : A uthortask Produce a draft version of a research report in English
using plain macrosarticulation Hand movements associated with typingIN PU Tinput
Keyboard Key strokes
performance Key decodingSYSTEMcore
All related hardw are not directly accessed by the au thor Plain T£X macro package plus operating system used
presentation T^jX compiler plus dvi viewerO U T PU Toutput
Video displaySets of characters rendered on the display
observation Reading the displayed tex t according to the characteristics of the dvi viewer
consultation : In terpretation of the sets of characters based on the English syntactical and sem antic definitions
Table 4.2: T^X-based authoring environment.
4.3 Cognitive Distances
The characteristics of the authoring environm ents as presented in Tables 4.1 and 4.2
show th a t there exists a body of knowledge th a t the au thor is required to know in order
to accomplish any established task successfully. An im portant characteristic related to
the com m unication between USER and S Y S T E M is the difficulty the USER may have
in m apping intentions into physical commands of the input language. This difficulty
is referred to as the gulf of execution [76, 56, 39]. A nother relevant characteristic of
interactive systems is the difficulty the user has in interpreting the available output. This difficulty is called gulf of evaluation [76, 56, 39]. Both the gulf of execution
and the gulf of evaluation are results of design decisions th a t are usually related to
restrictions imposed by the specification of the interactive system. These gulfs are
viewed, by the user, as distances to be bridged in order to realize tasks successfully through the provided interface.
For ideal environm ents such as the p en /p ap er one, the knowledge needed to bridge
both gulfs is not relevant if we assume the au thor already knows how to read and
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3 7
write. Com puter-based environm ents usually require additional knowledge. To au
tho r docum ents using a T^X-based system, for instance, a user should, a t least, know
the basics of both I^ X and the underlying operating system in addition to typing and
reading from a display screen. The cognitive distance to be bridged in this scenario
will depend not only on the m anipulation of the physical objects which compose the
interface; bu t it will also relate to the user’s knowledge of the tool used for typesetting.
As stated in [4, 76, 56], semantic distance relates the translation between the user's
intentions and the meaning of the interface language. This distance is a function of
both the expressiveness and the conciseness of the input language. Expressiveness
relates to the scope or sem antic coverage of a language. Ideally, highly expressive
languages provide support for the representation of all concepts in the domain in
which the language is intended to be applied. Conciseness relates to the mapping
the language provides to link tasks to the input syntax. Highly concise languages
are structured in a way to capture the sem antics of tasks, in the language's domain,
by syntactically simple statem ents. The macro package, for instance, is highly expressive but it is not concise.
4.4 Rendering Information
Inform ation exchanged in hum an-to-hum an communication is usually inaccurate and
unclear. For this reason different forms of inform ation exchange are usually neces
sary. In natu ra l language com munication, for instance, we often use gestures and
vocal sounds not related to the language as an a ttem p t to improve the transfer of inform ation. However, in user-com puter interaction the acknowledgment of a mouse
click may be reported by both a display change and a sound signal. In this case the user-com puter interaction is enhanced by the provision of feedback to the click
action in two distinct modes. As another exam ple consider the flight boarding an
nouncem ents th a t are usually made in most a irports through video term inals and
speech. In the described scenarios the additional modality may be viewed as a form of redundancy th a t enhances the quality of the inform ation transfer process.
Central to the use of m ultim odality as a form of communication enhancem ent is the notion of semantics-based inform ation organization. This form of structuring d a ta is understood as fundam ental in designing systems to be used for broadcasting inform ation. It establishes th a t the d a ta to be supplied to the m odality Tenderers
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
38
m ust be free of ambiguities. If the docum ent to be processed includes m athem atical
concepts, the ambiguity-free requirem ent does not allow the representation of different concepts by syntactically overloaded symbols.
4.5 Encoding Mathematical Concepts
M athem atical concepts need to be encoded in some form in order to be m anipulated.
The conventional m athem atical notation is, most of the tim e, the first encoded form
of these concepts we are exposed to. A lthough this general-purpose notation has been
the prim ary tool used for the teaching of m athem atics, it is not an adequate notation
to support the electronic com munication of the concepts.
As a visual system, the conventional m athem atical notation relies not only on a set
of symbols as a way of representing concepts, but it also makes use of spatial ar
rangem ents, variations of both font size and type, and other visual markers to aid
the representation of inform ation. These visual markers provide an efficient way to
represent a complex set of constructs by means of a lim ited set of symbols. This
characteristic is illustrated by the following two examples:
Exam ple 1: The convolution of two functions could be defined as follows:
If
£(/(/.)] = F{s)
and
£[»(<)] = G(»)
then the inverse product F( s ) G( s ) can be obtained in term s of f ( t ) and g(t) bv the expression
t£ _ 1 [F(.s)G(s)] = J f ( x ) g { t - x)dx
o
In the example above the change from lower case to upper case letters has been used
to indicate the domain change from t to s. The syntax used enforces the fact tha t
F( s ) is ju s t a different in terpretation of function / ( / ) . The Laplace transform ation as well as its inverse are represented by the character C which is the character L typeset
in a different way. The integration equation has its upper lim it t placed above its
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3 9
lower lim it to inform the reader where the operation starts and ends.
E x a m p le 2 : The m atrix equation
Lx = m
represents a system of linear equations and has
x = L_1m
as a solution. The linear equation
lx — m
with x and m as real numbers, has
x = r lm
as a solution.
A lthough both solutions are obtained by means of taking the inverse of the object
th a t prefixes the variable we want to solve for, and then m ultiplying this result by the object on the right side of the equality, the semantics attached to these operations is
not the same. This fact is represented by the use of upper case and bold face type in the m atrix equation.
The necessity of representing m athem atics by means of encodings th a t support elec
tronic com m unication of the concepts, has m otivated the creation of other notations. Perhaps the most intuitive approach is to m ap all dimensions involved in the standard
representation of the concepts into a single dimension. A lthough conceptually trivial,
this linearization procedure allows the com plete domain to be input into com puter
systems. One relevant aspect of this approach is the structure used for capturing the
meaning of the m athem atical concepts. Such structure should supply the au thor with
the necessary means to encode not only all existing concepts, bu t it should also be capable of supporting the encoding of concepts proposed by the author.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
40
4.6 Environment Modifications
The fact th a t the au thor relies on the interface to obtain the behaviour defined bv
the core as proposed in C hapter 3, may be used to represent docum ent authoring environm ents by the following pair
V = (S, I) (4.1)
where V , S, I are docum ent authoring environm ent, docum ent instance structu re and
system ’s interface respectively. This representation may be viewed as a refinement of
the framework proposed in Section 3.4 to address the details involved in the S Y S T E M
component. For com puter-based docum ent authoring environments, this s ta te needs
to be further decomposed in order to isolate the operating system ’s services from the behaviour provided by the docum ent structure. Figure 4.1 illustrates the framework
FRAMEWORK
SYSTEM
OPEATOIGSYSTEM
output
USER
ta ik
articalatif
input
Figure 4.1: Framework for docum ent authoring environments.
proposed in C hapter 3 where the S Y S T E M component has been modified to support
the proposed refinement. In this case core has been replaced by two lower level s ta te s1,
the operating system and the docum ent structure.
Consider, for instance, a com puter-based authoring environment Io = (So, Io) such as the one defined in Table 4.2. In this case S 0 represents the plain TgX macro package and Io is the complete interface part of the environment.
d e ta ils of the communication between these two states which are irrelevant to the present discussion have been omitted.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
41
The replacement of the keyboard device by a m ouse/display pair, for instance, would
require articulation, input and performance to be redefined. A lthough the au thor may
acknowledge a significant am ount of change due to the mouse pointing and clicking
actions th a t replaced the typing form of m anipulation, the basis of the docum ent
structu re has not been modified. The resulting environment can be represented as
i = (Soi A) where I\ is the modified interface. Replacing of the plain T^X macro package by DT^X, for instance, will not have any effect on other parts of the environ
ment besides the docum ent structure. This means the au thor will use the keyboard
for input, but is now required to have knowledge of fXT^X to express h is /her ideas.
This environm ent is represented by V2 = (Si, To) where S i is a docum ent structure
based on the Dlj^X macro definitions.
Different docum ent authoring environm ents may therefore be obtained by the following three approaches. One can either:
1 . m aintain the docum ent structu re and modify the system 's interface, or
2 . m aintain the system 's interface and modify the document structure, or
3. modify both.
4.7 Changes in the Interface
Environm ent modifications as discussed in the previous sections do not include the
reasons why the changes were considered. This problem is approached, in this section,
by exam ining w hat m otivates changes in the system 's interface.
U ser-com puter interfaces can be viewed as facilitators which provide services to users. These services are structured according to the characteristics of the inform ation ob
tained by users as the result of com m ands executed on the interface objects. The
services may include inform ation which is directly available through the functionality
provided by the operating system. They may also involve concepts defined by the
s tructu re of the application, in which case, the user interacts w ith the application and
the operating system is viewed as a m ediator. In both scenarios, the dialogue between user and com puter may be structured according to the way interaction 2 resources are organized.
2 According to [74] interaction styles are key-modal, direct-manipulation and linguistic.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
42
In cases where the operating system is a m ediator, it is possible to represent the ser
vices provided by the application as interaction objects. A lthough the application 's
functionality may be preserved by this procedure completely, the user may not be able
to access to the structure of the application. This side effect is sometimes intentional
since hiding the internal structure may improve the use of the application for inex
perienced users. As an example, consider an authoring environm ent which has plain
T^jX as docum ent s tructu re and uses a keyboard/display arrangem ent as interaction
device. The au thor in this case is forced to directly m anipulate the objects defined
by the T£jX macro package. The replacement of this type of interaction by one based
on a m ouse/graphical display com bination with the necessary macro package objects
structured as sets of icons, for instance, would allow the use of the package with no
other knowledge besides the m anipulation of the interaction devices. Modifications
to the system ’s interface such as this are usually performed as an a ttem p t to improve
the usability of the docum ent authoring environment.
4.8 Recommendations
In the previous sections the basic characteristics which document environments should
have in order to support the authoring of m athem atical concepts have been discussed.
These qualities are presented in term s of properties and indicate possible software
design approaches th a t may be considered in order to achieve them . Ideal docum ent
authoring environm ents are viewed as software systems which support the properties
listed in Table 4.3.
4.9 Summary
The framework for interactive systems proposed in C hapter 3 was extended through
a refinement of the S Y S T E M state. The modification introduced a lower level of ab
straction composed of two states. This approach aims a t a separation of functionality
between the operating system and the docum ent structure. A set of properties which may be used to assess the quality of docum ent authoring environm ents designed to support the representation of m athem atical concepts has also been introduced.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
43
PR O PER TY DESIGN APPROACH
High Conciseness - Layer/processor addition to existing docum ent structu re definitions.
- Improve interaction style.High Expressiveness - Scope enhancem ent by the use of m eta-structures
and extensibility operations.Ambiguity-freeness / M ultim odality
- Enforcement of syntactically unique representations by the creation of domains.
Extensibility - Introduction of operations to update the docum ent structure.
Table 4.3: D ocum ent authoring environm ent characteristics and software design approaches to help achieving them .
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4 4
Chapter 5
M athem atical Constructs and their Representation
Document authoring is an incremental activity in which a set of interm ediate (draft)
versions of a docum ent are produced by the au thor prior to the creation of the final
one. Any given version of a docum ent, except the first one, may therefore be viewed
as the result of an update of the previous version of the docum ent.
A uthoring docum ents th a t contain m athem atics or authoring m athem atics for short,
is both increm ental and dynam ical. It is during this activity th a t the author makes explicit the syntax th a t will represent the m athem atical concepts included in a given
version of a docum ent. The design of docum ent structures to support these char
acteristics m ust therefore include mechanisms to manage both the update and the
m eaning-to-syntax bindings determ ined during authoring.
This chapter introduces the notion of using CFGs as a m ajor formalism to support
the dynam ics of authoring m athem atics. It discusses the use of CFGs as a tool
to capture the semantics of m athem atical concepts by means of user-defined syntax
th a t can be proposed during authoring. The lim itations CFGs have in supporting
docum ent structures th a t allow update are also addressed and an overview of the
solution proposed by this thesis to approach these lim itations is presented. A set of examples illustrating the possibility of using CFGs to capture the semantics of m athem atical concepts is provided.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4 5
5.1 Notational Systems as Languages
A notational system uses a set of symbols to describe quantities and ideas and it
is used as a supporting mechanism for the expression of ideas. A program m ing lan
guage is a special notational system designed to solve problems in a particu lar domain.
This characteristic often establishes the set of basic constructs th a t will provide the
language with the necessary power to approach the tasks in the specified domain.
Language constructs are generally structured around statem ents, and these program
ming statem ents are, most of the time, characterized as block statem ents, flow control statem ents, expressions, and declarations.
This way of structu ring the design of a program m ing language leads to the idea th a t
the language can be defined as a set of basic modules th a t can be combined to generate
other modules. The task of a module design may be accomplished through the use of
a Context-Free G ram m ar, which will thereafter be referred to as CFG in this thesis.
CFGs have been used as a m ajor tool for the specification of program m ing languages.
The im plem entation independence of this approach, provides the designer with the
flexibility to work on the development of a language w ithout the need to be concerned
with im plem entation details. Program m ing languages often need to be m apped into
other domains in order to be tte r respond to the user processing requests. Compilers
are well known tools th a t support the translation of language definitions into other forms.
CFGs are, in this thesis, viewed as abstract type definitions, and sentences belonging
to the gram m ar as variables of th a t type. This idea is supported by the fact th a t,
given a set of basic type definitions or a set of CFGs, other definitions can easily be
produced by the m anipulation of the rules already defined. The parsing process of a compiler can therefore be interpreted as a type checker which only verifies w hether a given variable (a sentence) belongs to the set provided by the type definition (the
gram m ar). This analogy can be further extended to include abstractions such as
the possibility of reuse of well defined gram m ars in the design of other program m ing
languages.
A lthough some notational systems are not designed to support program m ing, they can
be structured in a way sim ilar to program m ing languages. The standard m athem atical notation system is one example of such systems.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4G
5.2 Standard Mathematical Notation Characteristics
The representation of m athem atics by a finite set of symbols imposes restrictions on
the notation used. In the following section, the implications of this lim itation are
addressed, and the need for a form of representation based on semantics is discussed.
The field of m athem atics is composed of a collection of subfields or domains. The
various branches of science often make use of these subfields as supporting tools to ex
press their ideas. For instance, the formal presentation of some electrical engineering
concepts is supported through the use of calculus.
To develop an understanding of the m athem atical notation, classes of m athem atical
concepts can be defined, and a trivial one-to-one m apping between these classes and
the subfields can be established. A bstract m athem atical constructs are m apped onto
concrete symbols in order to provide humans with the representations necessary to
com m unicate m athem atical ideas as well as concepts. The m athem atical notation
can therefore be viewed as a language used to describe the abstract concepts.
Despite the fact th a t humans depend on concrete objects for sharing their knowl
edge, all m athem atical com putations rely on the ability to m anipulate the abstract
concepts involved. Like natural languages, the m athem atical notation has its basis in
a dynam ic process where an abstract idea can be represented by different language
constructs, and the inform ation conveyed by a particu lar language construct may
relate to different abstract concepts. This m any-to-m any m apping between abstract
concepts and language constructs characterizes this dynam ic process as ambiguous
and incomplete. Therefore any particu lar abstract m athem atical concept is said to
be represented by a notation construct if the parties involved in the inform ation ex
change have previously agreed on the notation defined for the concept. This leads to
the conclusion th a t this representation process not only is unstable, bu t also imposes
the characteristic of being locally redefined. The derivative of v w ith respect to t.
for instance, is a good example of the representation am biguity of a m athem atical concept. E ither w, v' or ^ could be chosen to illustrate the concept1.
The representation of various m athem atical concepts is usually accomplished bv over-
'The form of attachment where one concept is accessible by more than one reference is here denoted as aliasing.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4 7
Representation
derivative dv'dt
conjugate
complement
Figure 5.1: M anv-to-many relationship between m athem atical concepts and their representation.
loading meaningful symbols. The arithm etic mean, the conjugate of a complex num
ber as well as the complement of a boolean expression are well known concepts th a t
are often represented by placing a horizontal bar over a variable name. For instance,
variable v could be chosen to represent all three concepts. It is clear th a t context
has to be included in any a ttem p t to com m unicate m athem atical concepts. It is during authoring th a t the relationship between m athem atical concept and concept representation is available for modification. Selecting a particu lar syntax represen
ta tion may therefore not only determ ine the m eaning of a concept but it may also
indicate the dom ain where the concept is defined. Figure 5.1 illustrates the many-
to-m any relationship between m athem atical concepts and their representations. The
representation flexibility illustrated by the examples presented restricts the m anipu
lation and understanding of the m athem atical notation to users who share a common understanding of the term inology applied.
As science progresses, the new ideas proposed, as well as the necessary support
ing assum ptions, need to be fully described. This condition places the extensibility
requirem ent on the notation used to express the results obtained. M athem atical sym
bols will need to be provided in order to precisely describe the new concepts and new
syntax may therefore need to be introduced as a way of avoiding am biguities. Exten
sibility is frequently used in m athem atical notation to either locally define symbols
or to represent new concepts. The following scenario illustrates this characteristic.
Consider the scaling of a plane defined by the following statem ent:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4 8
Assume (x, y, z) are given C artesian coordinates. We now let (x, y , z) be new coordi
nates where x = Ax, y = Ay, z = \ z and A is a positive scalar constant.
In the context described, x, y and z are neither complex conjugates, the complements
of boolean expressions, nor the means. A new interpretation has locally been pro
vided to the variables. The extensibility characteristic of the m athem atical notation
increases the level of complexity involved in capturing the semantics of the concepts
presented.
The representation of m athem atical notation can be achieved bv either a presenta
tional approach, in which the visual characteristics of the symbols used in the notation are emphasized, or by a sem antic approach, where abstract concepts are used as a
basis for the representation. The presentational approach was introduced during the
early stages of com puters. Typesetting systems like nroff/troff as well as T£X are ex
amples of such systems. A lthough both systems provide stable d a ta representations,
they lack the necessary features to be used as a basis for the representation of da ta
in forms other than text. In contrast, as argued in [11], a notational approach based
on the m eaning of symbols, th a t is, based on the semantics of the concepts is needed.
One of the difficulties presented by the representation of m athem atical expressions by
their contents is to capture the m eaning of the concepts. A nother way of expressing
this characteristic is to capture the m eaning which has been associated with a given
set of symbols in case the concepts have already been encoded as these symbols for
com munication. For this reason the representation of m athem atical concepts by the
sem antic approach has not yet been im plem ented in totality.
5.3 Capturing the Semantics of Mathematical Concepts
The use of CFGs as a formalism to support the capturing of m athem atical concepts is
discussed in this section. Both its advantages as well as its lim itations are addressed.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4 9
5.3.1 M athematics and Document Authoring
The representation 's lifetime of a m athem atical construct in a docum ent may be char
acterized as a variable th a t denotes a locally pre-established relationship between the
abstract concepts involved and a user-defined interpretation. Syntactic constructs may tem porarily be bound to specific meanings as the result of a process led by the
au thor of the docum ent in order to com m unicate h is/her knowledge. Therefore this
context-dependent binding process is the mechanism the au thor has to express infor
m ation by means of a finite set of symbols. By fixing an in terpretation for a given
syntax for a period of time, the au thor expresses h is/her knowledge at the possible
cost of introducing symbol overloading 2 and syntax ambiguity. This process may be
interpreted as context switching, where the au thor has the power to assign different
in terpretations to the set of symbols used for the representation of the m athem atical
concepts. Therefore am biguities introduced by the editing procedure have the au
th o r’s approval and control. They are part of the document because they express the
result of an already accepted form of representation.
The fact th a t the au thor is allowed to attach different meanings to syntactical struc
tures, adds a complex com ponent to the problem of capturing the semantics of m athe
m atical concepts. This characteristic introduces the idea of real-tim e docum ent struc
ture update. This means the structu re of the docum ent is modified during authoring
to include adequate syntax to capture the semantics of m athem atical concepts. For
this reason a semantics-based docum ent authoring model is necessary. This form of
authoring docum ents is formally defined in C hapter 6 and it will thereafter be referred to as dynam ic authoring.
Modeling structures to support dynam ic authoring requires mechanisms to support
the semantics capturing of the m athem atical concepts. This translates to the need of
addressing not only d a ta representation issues, bu t it also indicates th a t the context
in which m athem atical concepts are represented need to be considered.
The docum ent update notion, imposed by the dynam ic authoring model, establishes the necessity of well-defined mechanisms for both accessing and modifying the struc
tu ral base upon which the docum ent’s syntax and semantics are represented. In the
2In this context, symbol overloading is viewed as part of an incremental updating process where existing connections between mathematical concepts and syntactical constructs are modified. The modification process either establishes or keeps a many-to-one relation between mathematical concept and syntactical representation.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
50
case of having a gram m ar as the supporting structure for capturing the semantics of
m athem atical concepts, a modification of either the syntax used for the representation
of concepts3, or the introduction of a new construct, will require an update process
in which the related gram m ar definitions will need to be adapted according to the
modifications proposed.
It is during the authoring activity th a t syntax is bound to concepts. In the event
th a t am biguities are introduced by symbol overloading, authoring mechanisms can
be provided to resolve all context-dependent representations which, according to the
author, need to be included in the docum ent.
5.3.2 CFGs and Data Types
The type of m apping between m athem atical concept and gram m ar representation
determ ines the degree of dependence between the two domains. If this dependence
is established by a one-to-one m apping (every m athem atical concept is captured bv
an isolated gram m ar definition), then modifications proposed by the au thor will be
reflected only in the production rules involved in the definition of the concepts m a
nipulated. This organization is supported by the software engineering principle of
separation of concerns which approaches a complex problem by concentrating on
each individual aspect of the problem one a t a tim e [48]. The m odularity necessary
for the application of this principle is obtained by assigning the set of production
rules th a t define a given m athem atical concept a unique context-free gram m ar. In
this thesis these structures are referred to as gram m ar fragments, modules or simply fragments.
A lthough a module is viewed as a syntactic concept which only affects the wav in
which software tex t is partitioned [72], sem antic restrictions on the associated text
may be used as criteria for m odularization. For instance, for m athem atical concepts
sharing the same syntactical structure when presented according to the conventional
m athem atical notation, their sem antic content is the only characteristic which may
be used to identify them. Therefore m athem atical concepts w ith different semantics
3This may seem contradictory since the semantic characteristics of a concept are not affected by the form in which it is rendered. The semantic information attached to the standard visual presentation of a concept will, most of the time, be included during the associated capturing procedure. A typical illustration of this characteristic is the juxtaposed multiplication in polynomials which is discussed later in this chapter.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
51
and the same syntax are understood as distinct objects. For this reason they should
be treated separately bv means of the ir own gram m ars.
One advantage of having CFGs as the fundam ental structure to capture the meaning
of m athem atical concepts, is the flexibility this mechanism provides in supporting
both the design and recognition phases of the capturing activity. The design phase
is characterized by the assignment of gram m ar fragments to m athem atical concepts.
D uring recognition, the input provided by the au thor is subm itted to the analysis
component of the associated language processor. At this stage, the input is encoded
as tokens and its syntactical structu re is matched against the related set of production
rules th a t has been provided during the design.
A nother way the recognition phase may be viewed is as the execution of a membership
verification performed by the analysis com ponent. For this in terpretation a CFG is
equivalent to a data type or ju st type and each valid input is an instance of the type.
This association is consistent with the notion of type provided by [72]. The da ta type,
in this case, is represented by the s ta rt symbol of the CFG.
The organization proposed in this section, merges the notions of module and type
by using CFGs as sta tic structures to support the semantics capturing requirement.
One benefit of organizing m athem atical concepts as sets of gram m ar fragm ents or
modules is the possibility of using both decom position and composition as aids to the
structuring process.
A lthough is is possible to capture the meaning of m athem atical concepts by means of sta tic structures such as CFGs, this approach presents lim itations. One im portant
lim itation is th a t CFGs only support the definition of docum ent interchange formats. This means CFGs do not support the fundam ental requirement th a t authoring m ath
em atics is a dynam ic activity in which the bindings between meaning and syntax are
established by the au thor while m anipulating the document. A discussion involving
this characteristic is presented as follows.
5.3.3 CFG Limitation to Support Authoring M athematics
This subsection illustrates the lim itation CFGs have in supporting the semantics cap
turing of m athem atical concepts. For this purpose consider, for instance, authoring
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
52
a docum ent which includes an expression involving the addition of integers such as
1 + 0 = 1 (5.1)
The m eaning of expression (5.1) may be captured by the CFG rules in Table 5.1.
This means + is the addition of integers, 1 and 0 are integers and = is equality.
add equality left_expr = right_exprleft_expr left.expr + right_exprleft.expr integerright_expr integerinteger 1 1 o
Table 5.1: CFG rules for addition of integers 0 and 1 .
Assume the au thor decides to update the current version of the docum ent by including
another expression. This expression contains the Boolean O R operation which is
represented by the + symbol, and TRUE and FALSE values represented bv integers 1 and 0 respectively. An example of such expression is
1 + 0 = 1 (5.2)
The syntax of expression (5.2) can be captured by the gram m ar in Table 5.1. However
its semantics cannot. This is because the au thor has determ ined th a t the context in
which this syntax is valid has changed. O perations on integers have been replaced by
operations on Booleans, 1 means TRU E and 0 means FALSE.
CFGs provide no means of updating the ir production rules. Therefore a docum ent
s tructu re based on this formalism has to include a mechanism to support the ability to
respond to authoring requests aimed at the creation of context-dependent meaning-
to-syntax bindings.
5.3.4 Updating CFGs
This thesis approaches the semantics capturing problem by means of an organization
th a t is based on sets of modules. A standard library is used as a storage facility
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
53
where a basic or default set of modules is placed. The modules required by a given
docum ent, at any instance, may be obtained from the set of default ones or from
the result of a composition procedure th a t may either include only modules taken
from the library or, if necessary, completely new ones. In this work the ability to
m anipulate the set of modules in order to update the available types is introduced as
the means to produce docum ent structures th a t support extensibility.
As has been shown previously, docum ent structures based on static organizations
such as CFGs do not support the dynam ic extensibility as required during the au
thoring process. It is during this stage th a t the au thor is free to select the syntactical
arrangem ent necessary to represent each m athem atical concept th a t takes part in
the docum ent. Mechanisms th a t support run-tim e update are therefore needed when
considering the design of structures to handle the dynam ic authoring of m athem at
ics. For this reason the following mechanisms need to be included when considering
the design of docum ent structures to manage the dynamical binding of m athem atical
concepts to syntactical constructs:
1 . increm ental update, and
2 . module reuse.
Central to the effective use of update as a process to modify d a ta are the notions of
identity, redundancy control and norm al forms [43]. For CFGs, the notion of norm alization as a process to elim inate update anomalies requires the identity verification
of gram m ar rules. This includes verification of both the syntax and the semantics
of the rules. This requirement is necessary because rules th a t have identical syntac
tical structu re do not express the reasons why they are intended for. The m eaning
attached to nonterm inals on the rhs of a rule depends on gram m ar rules th a t define
these nonterm inals. Therefore semantics of a CFG rule is a concept th a t involves
the notion of rule dependency. For this reason identical syntactical structure in CFG
rules does not guarantee identical semantics.
Exception to th is characteristic are rules th a t have a single term inal and no nonter
minals on the ir rhs. These rules have both their syntax and semantics determ ined by themselves. Syntactical identity, in this case, determ ines semantical identity.
A discussion regarding the involvement of identity, redundancy control and normal
forms in docum ent structures th a t use CFGs for the semantics capturing of m athem atical concepts is presented in the rem ainder of this section. The following subsection
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
54
provides examples to illustrate the problems introduced when gram m ar rules sharing
the same syntax are used to capture different semantics.
5.3.4.1 Identical Syntax and Rule Semantics
O bjects are identical if they are indistinguishable. This suggests th a t indistinguish
able gram m ar rules should be viewed as identical. This characteristic is addressed as
follows by an example in which a set of rules is shared by three different gram m ars.
Consider, for instance, CFG rules containing nonterm inals in their rhs. This indicates
th a t such rules depend on other rules in which the definitions of these nonterm inals
are provided. For this reason it is sometimes not possible to determ ine the semantics
of gram m ar rules before all nonterm inals have been replaced by term inals. To illus
tra te this characteristic consider, for instance, the CFGs defined bv the production
rules in Tables 5.2 and 5.3 . The fact th a t rules 1 and 2 from both gram m ars are iden-
catexpr 1 expr expr -I- term2 expr term3 term integer4 integer 1 | 2
Table 5.2: G ram m ar for addition of integers 1 and 2
addexpr 1 expr expr + term2 expr term3 term character4 character a | b
Table 5.3: G ram m ar for concatenation of characters a and b
tical determ ine th a t both gram m ars define lists of terms separated by the + symbol.
Although they share this characteristic the semantics of both rules 1 and 2 depend
on the inform ation provided by rules 3 and 4. The derivation of a word such as 1 + 2,
for example, as provided in Table 5.4, illustrates th a t rule 1 from the gram m ar in
Table 5.2 defines lists of integers 1 and 2 that are separated by the + symbol. The
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
expr =+ expr + termterm + term
=> integer + term=> 1 + term=> 1 + integer=> 1 + 2
Table 5.4: Derivation of word 1 + 2.
expr => expr + termterm + term
=> character + term=f> a + term=> a + character=> a + b
Table 5.5: Derivation of word a + b.
fact th a t integers are being separated by the + symbol suggests th a t rule 1 , from this
gram m ar, captures the addition of integers concept.
In a sim ilar way, the derivation of a word such as a + b, as provided in Table 5.5, for example, shows th a t rule 1 from the gram m ar in Table 5.3 defines lists of characters
a and b separated by the + symbol. For this case it can be stated th a t rule 1, from this gram m ar, captures the concatenation of characters concept.
catexpr 1 expr expr + term2 expr term3 term integer | character4 integer 1 i 2
5 character a | b
Table 5.6: G ram m ar for operations on integers and characters.
The CFG defined by the rules in Table 5.6 combines rules 3 and 4 from both gram m ars defined in Tables 5.2 and 5.3. Table 5.7 illustrates th a t the derivation of a word
a + 2 , for example, determ ines th a t rule 1 from the gram m ar in Table 5.6 defines
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
56
expr => expr + term=> term + term=> character + term=> a + term
a + integer=> a -I- 2
Table 5.7: Derivation of word a + 2.
lists of characters a and b a n d /o r integers 1 and 2 separated by the + symbol. The
in terpretation for the + symbol is not determ ined because the semantics attached to
this symbol cannot be expressed by the gram m ar in Table 5.6.
The complete in terpretation for the + symbol, in this case, is not provided by the
gram m ar rules as it had been for the previous two scenarios. The reason for this is
because the semantics attached to this symbol cannot directly be expressed by the
gram m ar in Table 5.6. Additional inform ation, in this case, is necessary in order to
specify how integers and characters are to be processed by the + operator.
The gram m ars presented in this subsection illustrated the possibility of one rule being
used to express several different semantics. As has been shown, one CFG rule may
be applied to express many semantics. It is also possible to have the semantics of
a single concept captured by different CFG rules. This characteristic is discussed in the following subsection.
5.3.4.2 Redundancy, Syntax Equivalence and Normal Forms
In relational databases the idea of redundancy is related to the notions of identity,
functional dependency and normal form [43]. As a property of the semantics of the
a ttribu tes, functional dependency expresses relationships among attribu tes. There
fore it depends on a value-based notion of identity. Functional dependency is used in
determ ining the presence of redundancies in database schemas.
A normal form is a schema which has desirable update properties and does not contain
certain types of redundancies. A norm alization is a process th a t breaks down unsat
isfactory relation schemas according to norm al forms criteria. The schemas generated through norm alization are therefore said to be normalized.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
As static structures, CFGs do not support dynam ic authoring. For this reason exter
nal mechanisms need to be designed in order to update the set of gram m ars involved
in the modifications proposed during authoring. Effective updates of these structures
require both the identification and control of redundant definitions.
CFG redundancy is, in the context of this work, defined in term s of gram m ar rules.
The examples presented in the previous subsection illustrated th a t CFG rules th a t
have identical syntax may be used to express different semantics. In the context
of this thesis, such rules are considered redundant. This form of redundancy will
thereafter be referred to as redundancy by syntax identity.
This type of redundancy can be detected by string comparison. Its control can be
obtained by a gram m ar update procedure th a t
1 . creates a new gram m ar for the redundant rule, and
2 . elim inates this rule from the gram m ar where it was identified.
A nother form of redundancy occurs when the same semantics is expressed by differ
ent CFGs. G ram m ars in this case differ due to nonterm inal renam ing (isomorphic
gram m ars). This form of redundancy will thereafter be referred to as redundancy by
syntax equivalence4.
The fact th a t isomorphic gram m ars have different nonterm inal sets implies th a t their
sets of production rules are also different. Since the s ta rt symbol of a gram m ar is
interpreted as a type, these gram m ars introduce the possibility of a ttaching different
names to a single type definition. A careful analysis, in this case, is necessary to
identify the scenarios where different types need to be defined. For this situation a domain specification needs to be provided in order to ensure th a t the type definitions are unique.
For any arb itra ry CFGs G x and G2, it is undecidable [55] w hether L {G X) = L (G 2).
Therefore there is no effective approach to identify redundancies by syntax equivalence. In other words this form of gram m ar redundancy cannot effectively be elimi
nated by operations performed on the structu re th a t supports the semantics capturing
4Isomorphic grammars produce equivalent abstract syntax trees for all words in the language they generate, therefore the same semantics is always expressed by them. For this reason redundancy by syntax equivalence is only identified if the grammars involved are isomorphic.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
58
of the m athem atical concepts. For this reason the idea of norm alization as a process
to remove update anomalies is not considered.
W hen capturing the semantics of m athem atical concepts by means of CFGs, the
term inals of the gram m ar are associated with the names of the concepts. Hence
the set of dependencies th a t exist am ong the term inals of the gram m ar describes
concept dependencies which have been expressed by the rules of the gram m ar. C hap
te r 6 introduces the notion of gram m atical dependency which is a form of expressing
relationships am ong gram m ar rules. This characteristic is applied to express the de
pendencies which exist am ong the term inals of a gram m ar. The knowledge of these
dependencies identifies sets of gram m ars which may contain redundancies.
As it has already been illustrated in Subsection 5.3.4.2, the syntax and portions of the semantics of m athem atical concepts can be captured bv CFG rules. This approach
will lead to the creation of a set of gram m ars which will be used to support the
representation of the concepts th a t will take part in a given instance of a docum ent.
The creation of gram m ars may be accomplished either by means of editing procedures
or they may be generated as the result of operations th a t involve o ther gram m ars.
For both scenarios, the creation process will be simplified if an adequate gram m ar
form at is imposed. This form at is proposed in C hapter 6 as a normal form for CFGs.
Two different im plem entation aspects will benefit from this normal form. They are:
• the semantics capturing of m athem atical concepts and
• the com position of gram m ar fragments.
The need of a normal form for the gram m atical structure used in the semantics cap
turing process is to avoid definitions where the nonterm inal arrangem ent on the right
hand side of the production rules hides the meaning of the concept to be captured.
This problem is solved by the adoption of a set of tem plates th a t will enforce the construction of the production rules in a particu lar way in which the m eaning of the
m athem atical concepts could correctly be captured. These tem plates are the smallest
structura l com ponents th a t are allowed in the capturing of m athem atical concepts
by CFG rules. As restrictions on gram m ar rules they establish th a t the capturing
approach may need to decompose the abstract concepts. This is necessary to en
sure th a t concept com ponents are captured by gram m ar rules th a t follow the form at defined by the tem plates.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
59
The com position process, by which gram m ar fragments may be combined to produce other definitions, should be free of any inform ation th a t is not necessary for
the successful completion of the desired gram m ar arrangem ent. This means the com
position process should not introduce definitions th a t carry redundant inform ation.
The following sections discuss the possibility of capturing the m eaning of abstract
m athem atical concepts by means of CFGs.
5.4 Representing Polynomials
The idea of expressing m athem atical concepts as language fragments is used here
as an aid to capture the semantics of m athem atics concepts. W ith this technique,
the definition of m athem atical concepts which contribute to the definition of other
concepts can be isolated and approached by gram m ar fragments. A composition
process will la ter combine all necessary gram m ar fragments as a way of representing
complex m athem atical concepts.
As it is composed of abstract concepts, m athem atics needs to be encoded in order to
be com municated. The encoding proposed by the conventional m athem atical no ta
tion is a representation form at th a t is usually used for com m unicating m athem atics.
A lthough this notation is used to support the discussions on the capturing proce
dure this thesis proposes, it is im portan t to emphasize th a t m athem atics is composed
of abstract concepts. For this reason encoding strategies are needed to support the
m anipulation of these concepts. For instance, a discussion involving a polynomial is simplified when this abstract concept is encoded according to the standard m ath
em atical notation. Consider, for example, the following identity expression, which displays the elements of a polynomial as its right hand side term .
k = abc + a2b2c 2 + . . . + anbncn (5.3)
In order to capture the m eaning of equation (5.3) by CFGs, the meaning of each concept th a t is included in this equation can be expressed by a gram m ar fragment. This
indicates th a t gram m ar fragments for equality, juxtaposed multiplication, addition,
power and additiomellipsis operations need to be supplied.
It is im portan t to observe th a t expression (5.3) used the ellipsis (continuation dots)
operation to express the repetitive addition of polynomial term s. This and_so_on
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6 0
abstract concept means th a t the addition pattern th a t started with the first term of
the polynom ial continues and stops when the last term is reached. For this reason
this operation is captured as a addition-ellipsis5 binary operation.
9i term first juxtaposed otherother second juxtaposed th irdjuxtaposed efirst a powersecond b powerthird c powerpower superscript identifier
Table 5.8: CFG fragment for expressing words from G.
Consider the right hand side of expression (5.3) where the polynomial is defined. One
possible way to express this as gram m ar fragments is to consider each term of the polynomial as a word from the language G — {a kbkcfr | 2 < k < n } U {abc}. Table 5.8
92 polynomial polyexpr addition.ellipsis termpolyexpr polyexpr addition termpolyexpr term
Table 5.9: CFG fragment for expressing addition_ellipsis and addition operations.
illustrates one possible gram m ar fragment th a t recognizes the words defined by G.
The gram m ar displayed there captures both the juxtaposed multiplication and the
power concepts. The addition-ellipsis and addition operations may be captured by the gram m ar in Table 5.9.
In order to completely express equation (5.3) by CFGs, the equality concept needs to be considered. Table 5.10 provides a gram m ar fragment th a t captures this concept.
The com position of the three gram m ars displayed in Tables 5.8 to 5.10 produces a
CFG th a t recognizes equation (5.3).
5The expression 1 < 2 < 3 < . . . < 5001 states that each integer from 1 to 5000 is less than its successor. The andLso-on concept for this scenario abstracts the notion that the logical condition less than that applies to the first pair of integers continues until the last pair is reached. For this situation the a n d .so .o n operation would be captured as less .th a n -e llip s is binary operation.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
61
93 expression leftside equals rightsideleftside ID EN TIFIE Rrightside polynomial
Table 5.10: CFG fragm ent for expressing equality operation.
A lthough G has been used to list all term s of the polynomial equation, its words may
also be applied to represent other m athem atical concepts. Consider, for instance, the
field of formal languages. In this case akbki* is viewed as a string of characters. The
semantics capturing process therefore should be based on the notion of considering
literal strings of characters as the syntactical s tructu re to be processed. A string such
as a 2 is therefore interpreted as the concatenation of a with itself which generates
aa. To capture the meaning of the words in G for this in terpretation a mechanism to
represent the concatenation of ak w ith bk concatenated with & needs to be provided.
For this reason, either ak, bk or c* is to be recognized by a structure th a t accepts the
concatenation of a character with itself k times.
As illustrated by the two different in terpretations associated with the syntax defined
by the words in G , the context in which concepts are expressed must also be taken
into consideration. As different in terpretations may be attached to any given syn
tax, a strategy to resolve the syntactical am biguities introduced needs to be defined.
The needed mechanism should be capable of supporting the capture of all possible
in terpretations associated with the syntax considered.
Exponents and indexing are concepts used in various fields of m athem atics. These
two concepts are usually described with the support of superscripts and subscripts.
The section th a t follows discusses a gram m ar approach to capture both concepts.
5.5 Representing Subscripts and Superscripts
Subscripts and superscripts attached to literal strings of characters can be viewed as modifiers th a t carry additional m eaning of a symbol. The semantics of both subscripts and superscripts may be captured by considering them as binary m athem atical
concepts whose argum ents are the base and the sub/superscrip t. They have right as
sociativity and the highest precedence am ong the other concepts.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6 2
9b words words SUB wordwords wordword word SUP indexword indexindex NUM BERindex ID EN TIFIE Rindex ( words )
Table 5.11: CFG fragment for subscripts and superscripts.
One possible gram m ar fragment to represent both subscripts and superscripts is pro
vided in Table 5.11. The production rules associated with superscripts follow the rules for subscripts in order to ensure the correct precedence for both operators.
Consider, for instance, the representation of where S is identified by means of an
index i which itself has both a superscript and a subscript . The following expression
represents the variable S in term s of the subscript and superscript concepts.
S sub(i s u b j ) su p k
A more complex example is provided as follows to illustrate the precedence charac
teristics of bo th superscripts and subscripts. The symbol = is used to represent the equivalence of the two forms of representation.
z qS J = (S sub(i s u b j ) su p k ) sup((z subp) supq) (5.4)
J
The use of CFGs to capture both subscript and superscript concepts do not express
the context in which these definitions are considered. One way to approach this requirem ent is to introduce a scope mechanism to delimit the context in which concepts
are expressed by unique syntax. C hapter 6 proposes a scope structu re to solve the syntax am biguity problem created by the overloading of the symbols used for the
representation of the concepts.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6 3
5.5.1 Overloading Subscripts
The need for structu ring m athem atical notation around a set of domains is em pha
sized here. Consider the recurrence relation an+i = 3an,n > 0,ao = 5 which has
an — 5(3"), n > 0 as its general solution. Also consider a two dimensional m atrix
defined as follows:
The \ 2 th term of the recurrence relation, a = 5(312), has the same syntactical
representation as the element of m atrix A located on the first row and second col
umn. Although these concepts share the same visual form, different in terpretations
are expected depending on the context in which they are presented. This context is
interpreted as a domain or subfield and may be as general as, say, Discrete M athe
matics or Linear Algebra. It may also be specific depending on the characteristics of
the concepts involved. By letting a 1 2 be part of a domain, the additional inform ation
necessary is supplied to determ ine its m eaning uniquely. This form of structu ring
m athem atics, by grouping knowledge into domains, will be used as a mechanism to
resolve am biguities in this thesis.
In the linear algebra domain, for instance, the syntax a,j represents the operation
th a t establishes the link which is used to locate elements in the A m atrix. The
need for an operator to represent the dimensional link expressed by the subscript
used for the location of m atrix elements, can be illustrated by the fact th a t 6 10o is
the 100^ element of a one dimensional m atrix. The in terpretation associated with
& 1 2 3 is not unique. If m atrix B is three dimensional, for instance, then 6 1 2 3 refers to one particu lar element in the structure. A one-to-one m apping between syntactical
representation and element location is not possible if m atrix B two-dimensional. Two in terpretations are associated with the syntax £>i2 3 in this case: either an element
located in row 1, column 23 or in row 12, column 3. This am biguity could be resolved
by the introduction of an operator to determ ine where the link between the dimensions
of the structu re is to take place. For instance bsub( 12 ,3 ) could be used to reference
the element located in row 12, column 3 in m atrix B.
A = (5.5)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6 4
5.5.2 Overloading Superscripted Symbols
Consider the overloading of both + and — symbols provided by the expressions below.
i/i = /+ + r (5.6)
and
/ = r - r (5.7)
9f function_parts positive_part — negative_partpositive.part functioned su p +negative_part functioned su p—functioned ID EN TIFIE R
Table 5.12: CFG representation of the positive and negative parts of a function.
In the above expressions / is a function, / + represents the positive part of / , and
f ~ the negative part of it [93]. The semantics attached to the + symbol in equation
(5.6), indicates th a t this symbol is used to represent two operations and each instance
of it aims a t the representation of a different concept. The superscripted instance
characterizes the unary postfix operation of taking the positive part of a function,
whereas the binary infix instance represents addition. The definition presented in
Table 5.12 illustrates a possible gram m ar fragment to represent both the positive and
negative parts of a function.
5.6 Representing Matrices
The representation of m atrices is usually done by means of a com bination of upper
and lower case letters. An upper case le tter is used to denote the m atrix itself and the
corresponding lower case le tter combined with lower case subscripts define both its elements as well as their location in the m atrix. S tructural concepts such as vectors and m atrices depend on the representation of lists since both vectors and matrices
characterize a collection of elements organized in a particu lar way. The gram m ar
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6 5
fragment illustrated in Table 5.13 presents the necessary rules for the definition of matrices.
Qc m atrixrule MATRIX} dim list ( elist ) }dim list dim list : sizedim list sizeelist elist , elelist elel ID EN TIFIE Rsize NUM BER
Table 5.13: CFG fragment for matrices.
The following system of linear equations represented in m atrix form at is used to
illustrate the syntax defined by the rules presented in Table 5.13.
n (;;)=U)The syntax enforced by the rules provided by Table 5.13 is presented below:
M atrix{2 : 2 (3 ,1 ,0 ,3 )} • M atrix{2 : 1 (2 1 , 0 :2 )} = M atrix{2 : 1(4, —5)}
where the • symbol denotes m atrix m ultiplication operation. The operator , is in tro
duced as a way of representing the m atrix elements as nodes of a hierarchical relation between the entries of a m atrix.
The representation of the power, inverse and transpose of a m atrix by superscripted
symbols does not carry the necessary semantics of each individual operation. For this reason, it is necessary to represent each one of these concepts bv means of its own
semantics. A lthough the representation of the power of a m atrix is characterized by
a binary operation, both inverse and transpose are unary operations and are usually
recognized by syntactical structures in the postfix form.
M atrices with only one row or column can also be considered as vectors. The syntactical representation of these concepts is usually obtained by means of single lower case
letters typeset in boldface font. In this com pact form of representation, the operator
is identified not by means of symbols attached to the operands, but by the type of visual representation adopted for displaying the selected symbol.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
66
5.7 Representing Sets of Numbers
The representation of sets of numbers as intervals is frequently used in algebra. For
example (a, b) = {x | a < x < 6} and as [a, 6] = {x | a < x < b}. In this form of
representing numbers, the delim iters do not always match. We illustrate this bv the
two expressions th a t follow.
[a, b) = {x | a < x < b}
(a, 6] = {a; | a < x < b}
9d 1 intervaLvar left_delimiter values right_delimiter2 values left_value , right.value3 left_value ID EN TIFIE R4 right.value ID EN TIFIE R5 left.delim iter [6 left_delimiter (7 right_delimiter ]8 right_delimiter )
Table 5.14: CFG fragm ent for intervals.
A possible gram m ar fragment for the representation of the four types of intervals is
given in Table 5.14. A lthough the gram m ar presented in Table 5.14 can be used to
represent num ber intervals, it is not useful for capturing the semantics of the concepts
involved. The fact th a t the nonterm inals left .del imiter and righLdelimiter are not uniquely defined suggests th a t production rule 1 reduces to four different sentences.
A nother problem with this definition is th a t it requires a parser with lookahead greater than three.
The gram m ar fragment shown in Table 5.15 captures the semantics of the structure.
This gram m ar is designed in a way th a t each pair of interval delim iter is uniquely represented by a production rule.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6 7
9 e interval openJntervalsinterval closedJntervalsopenJntervals open_part RIGHT_OPEN_PARopen in te rv a ls open .part RIGHT_CLOSED_DELclosedJntervals closed.part R IG H T.CLO SED .D ELclosedJntervals closed.part R IG H T.CLO SED .D ELopen .p a rt L E FT .O PE N J A R bodyclosed.part L E FT .C L O SE D J9E L bodybody left.value COMMA right.valueleft.value ID EN TIFIE Rright_value ID EN TIFIE RCOMMARIGHT_CLOSED_PAR )RIGHT_CLOSED_DEL ]LEFT_OPEN_PAR (LEFT_OPEN_DEL [
Table 5.15: CFG fragment to capture the semantics of intervals.
5.8 Representing Sums
The concept of sum m ation is discussed in this section. Both am biguity and extensibility problems associated with this operation are illustrated by exam ining its sem antic
characteristics.
Consider the sum represented by the expression below.
21 = £ i (5.9)t=i
E quation (5.9) illustrates the am biguity involved in the use of the = symbol, where
its m eaning can either represent the s tart of a sequence of attributions to variable i, or
the equality between two quantities. A lthough the syntax most commonly associated
with the sum of a sequence of items includes the = symbol as a way of expressing the iteration process, the semantics of the sum m ation construct does not require the
equality operator. The concept of sum m ation may be described by an operation on
an expression th a t is evaluated according to a sequence of predefined items. One
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6 8
possible form of expressing this is by means of the syntax
Sum{rangeJ is t ' . express ion} (5.10)
which captures the meaning of the sum m ation concept. In Equation (5.10) S u m is a
prefixed binary operator and rangeJist defines the sequence over which expression is
to be com puted. The operand rangeJist captures the meaning of the iteration part of the sum m ation construct.
The fact th a t it is possible to a ttach a particu lar in terpretation to the form in which
rangeJist is syntactically represented is a problem to be considered whenever rendering is to take place. It is intuitive to associate the = symbol with the iterative
component of the sum construct, as illustrated in Equation 5.9. However the idea of
range is more meaningful when this expression is represented as
2 1 = £ i (5.11)!<t<6
9i 1 identity _expr sam ple.expr = sample_expr2 sample_expr expr3 sample_expr sum4 sum SUM { rangeJist : sample_expr }5 rangeJist s ta rt , end6 s ta rt identifier = expr7 identifier ID
Table 5.16: G ram m ar for sum m ation.
The gram m ar fragment illustrated in Table 5.16 allows sum m ation constructs, such
as the one in Equation (5.9) to be described by the syntax th a t follows.
S u m { i = 1,6; i} (5.12)
There are situations where more complex iteration control is required and sometimes
the necessary sum m ation condition is expressed as com pound statem ents. The ex-
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6 9
pression below illustrates this fact.
m + n n
* = Yi H i + j (5-13)i—m /2 j = 0
i+ j= n
In Equation (5.13) the inner sum m ation includes a compound statem ent. The itera
tion mechanism is extended to support the composed condition which makes use of
a syntactically hidden conjunction to define the lower lim it for the iteration. The
m eaning associated with the = symbol, in its two occurrences on the conjuncted con
dition, is not the same. An am biguity was introduced by the additional semantics
attached to the = symbol as the result of an extension procedure.
9i 1 identity_expr sample_expr EQ sample_expr2 sample_expr expr3 sample_expr sum4 sum SUM { rangeJist : sam ple.expr }5 rangeJist s ta rt , end5.1 s ta rt single_start5.2 s ta rt com pound_start6 single_start identifier = expr6.1 com pound_start single_start ' identity_expr7 identifier ID
Table 5.17: G ram m ar for sum m ation.
A possible gram m ar definition to support the conjuncted condition can be obtained
by modifying the fragment associated with the definition of the sum m ation operation,
and by the addition of a gram m ar fragm ent to represent the composed version of the
iteration. Table 5.17 illustrates a possible set of gram m ars th a t can be used to capture the sum m ation operations th a t have compound iteration statem ents.
Since the gram m ar proposed in Table 5.17 has been developed with the purpose of
extending the recognition power provided by the gram m ar fragment in Table 5.16,
it is expected th a t some common structu ra l knowledge is shared between the two. This is true since rules 1 to 5 and 7 are the same in both fragments. Also rule 6 is
semantically equivalent in the two definitions. The two instances of this rule differ only on the ir left hand side nonterm inals.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
70
The gram m ars proposed for capturing the semantics of the sum m ation concept illus
tra te the need for a composition process to support the extension of already defined
constructs by reusing existing gram m ar fragments. C hapter 6 discusses the gram m ar
extension problem and provides a solution in term s of gram m ar operations.
5.9 Conclusion
This chapter introduced the notion of using CFGs as the m ajor formalism to capture
the semantics of m athem atical concepts. It discussed the advantages and lim itations
of using CFGs to support the dynam ics of authoring m athem atics.
The syntax of program m ing languages is usually specified by means of CFGs [95]. S tructuring the m athem atical notation as a program m ing language has the advantage
of using CFGs for its specification and processing. Specification is supported by
the C FG 's structuring m ethods which include composition, choice, repetition, and
recursion [95]. Effective and efficient parsing algorithm s and tools are available to
support its processing.
A lthough CFGs have successfully been used for the specification of the syntax of pro
gram m ing languages, this formalism is not adequate for the definition of the semantics
of program m ing languages [100]. A nother im portan t lim itation this formalism has re
lates to its s ta tic characteristic. This restricts its use to the support of organizations
th a t do not depend on the notion of update.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
71
Chapter 6
M odelling Context Dependent Information
The notion of using CFGs to support the semantics capturing of m athem atical con
cepts was introduced in C hapter 5. This chapter proposes the fundam entals of a docum ent organization th a t models the dynam ics of authoring m athem atics. The
model supports both the extensibility and am biguity characteristics of m athem atical
notation and is capable of capturing the m eaning of m athem atical concepts bv means
of syntax defined during authoring.
6.1 Authoring M athematics and Multimodality
This section presents the basic com ponents of a structure to support the dynam ic
authoring process as discussed in Section 5.3. As emphasized there, different inter
p retations may be assigned to a given syntax. This behavior is understood as insta
bilities in the binding between m eaning and syntactical representation. As a m ajor
characteristic of authoring m athem atics this needs to be addressed in any proposal
to model this type of authoring.
The conventional notation which is used for the communication of m athem atics is characterized by a context-dependent m eaning-to-syntax binding. This dynam ical
form of attaching m eaning to syntax is the mechanism available to the au thor for ex
pressing knowledge by means of the symbol arrangem ent he/she believes is the most
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
72
adequate for the com munication of the ideas to be presented through the docum ent.
One characteristic this notation has is to leave unspecified the domain which the con
cepts represented belong to. A lthough this simplifies the notation used it imposes
lim itations on the rendering of concepts for communication in different m odalities.
The context-dependent quality also requires the au tho r/u ser to have knowledge of the
context where the appropriate m eaning is to be associated to a syntactical represen
ta tion of a concept. One way in which this lim itation may be approached is to assign
this knowledge requirement to the structu re th a t is used for capturing the semantics of the constructs.
The organization proposed here addresses the dynam ic m apping between meaning
and representation by means of a m eta-system . This structure establishes the nec
essary m ethods for capturing the semantics of m athem atical concepts, leaving the
definition of the desired notation to the author. W hen new concepts or extensions
to constructs already defined are necessary, the au th o r’s involvement will be required
to configure the system for capturing the constructs th a t need to be included in the
docum ent. The dynam ic authoring process may be viewed as a m odular organization
composed of a Param eterized N otational S tructure (PNS), a Hierarchical Interm edi
ate R epresentation (HIR) and a Rendering S tructure (RS).
A param eterized notational s tructu re is an organization defined by a m eta-structure, a program m ing language and a set of gram m ars. This set contains the necessary
gram m ars to capture the syntax and semantics of the m athem atical concepts th a t
have been included in a given docum ent. The m eta-structure provides the rules to
be used for the creation of the gram m ars th a t belong to the set. The program m ing
language m anipulates the gram m ars th a t have been created according to the m eta-
structure. The notion of scope is also provided by this language which is applied
to resolve syntax ambiguities. S tatem ents of this language include the m athem atical
constructs th a t have been encoded according to a domain defined by a scope. In
sum m ary this language provides a mechanism to aid the authoring of m athem atical
concepts th a t are being captured by the gram m ars from the set. It also provides, by
means of the scope, a dynam ical form to cope with syntactical ambiguities.
Interm ediate representations of docum ents are generated as a result of the interaction between the au thor and the PNS. These hierarchical interm ediate representations
support the provision of the inform ation th a t the rendering structure will m anipulate in order to generate different views of a docum ent. The set of docum ent views
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
73
produced by the RS will depend on the purpose of the application. For this reason it
processes the HIR based on knowledge provided by application experts.
PNS RSHIR
Figure 6.1: S tructure to support dynam ic authoring and m ultim odality processing
The interaction between the three modules is illustrated in Figure 6.1. The ar
row /function pair is used to represent how inform ation is processed. The in terpreta
tion associated w ith this form of representation is described as follows.
Function / ( ) represents the service provided by PNS to its only client HIR, which
involves the creation of an interm ediate docum ent representation. The set of functions
h i ( ) , . . . , hk() is used to represent the set of services tha t RS provides. These services
are based on the knowledge stored in HIR th a t are shared with RS through g(). They
are mechanisms to produce different views of the encoded docum ent obtained from
HIR. The views, represented bv the boxes labeled vk in Figure 6.1, are the
result of the application of the rendering form ats required by the final application. The diagram shown in Figure 6.2 describes the dynam ics of the model proposed.
The following discussion makes use of the inform ation provided in Table 6.1 and the diagram shown in Figure 6.2 to present the operational organization of the complete
process. According to this diagram every interm ediate docum ent i is the result of
intentions of the au thor a, coded as language / statem ents, th a t will m anipulate the concepts defined in the set of gram m ars g. As illustrated in Figure 6.2, knowledge
of the m eta-structure m is necessary to update the set of gram m ars. This may be done by means of the program m ing language /, which updates the gram m ar set based
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
74
Modules Components Documents Processing Entitya authors application specialist
PNS 1 program m ing languagem m eta-languageg set of gram m ars
HIR i interm ediate docum ent representationRS P application specific semantics
r rendering mechanismd docum ent applicatione editing
Table 6.1: Com ponents involved in dynam ic authoring for m ultim odality.
Figure 6.2: A sketch of the dynam ics of the au thoring/rendering process.
on previously defined gram m ars. A nother possible way to update these gram m ars
is by the direct use of a text editor. E diting mechanisms used for this purpose are
represented by e in Figure 6.2. This way of m anipulating g is needed whenever the
capturing of the meaning of a concept cannot be obtained by the use of /. The
provision of all gram m ars necessary for supporting the dynam ic authoring process is
therefore the result of actions taken by the au thor th a t involve the structure defined by the PNS. The various docum ent applications d may be obtained from the interm ediate representation i , by the application specialist s. For each application the knowledge
of specific semantics p as well as rendering mechanisms r are necessary.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
75
Usually the au thor is interested only in providing the semantics for the assumed
standard docum ent’s usage which, most of the time, involves only a printed view of
the docum ent. The semantics for other usages such as voice, for example, could be
obtained with the support of an application specialist.
Electronic docum ents allow for the possibility of rendering the abstract concepts which
compose the docum ent's logical structu re as different concrete variables. A nother
characteristic of electronic docum ents is th a t the meanings of the included concepts
need to be properly encoded to allow their processing by the com puter. It is also
difficult or sometimes even impossible to predict all potential applications th a t may be
assigned to the abstract concepts th a t comprise a document. All these characteristics
may be supported by the availability of:
1. an adequate semantics-based encoding of the m athem atical concepts, and
2. a set of associated rendering mechanisms to convert the encoded concepts to
the respective expected formats.
The encoded concepts are represented in Figure 6.2 by the circle named i and the
rendering mechanism bv the circle nam ed r. As illustrated the interm ediate document
representation is the only component th a t is visible to the rendering mechanism. For
this reason all am biguities m ust be resolved at the PNS during authoring.
The organization proposed supports the m ultim odal communication of concepts by
modeling the hum an behavior involved during the authoring activity. Since this thesis
aims at capturing the semantics of m athem atical concepts it only considers the PNS
portion of the organization presented. Sections 6.3 to 6.8 describe the fundam ental
com ponents of the PNS module. Sections 6.3, 6.4, 6.5 and 6.7 discuss the gram m ar
com ponent. Section 6.6 introduces the language and Section 6.8 the m eta-language
components, respectively. A gram m ar-based structure to model the dynam ics of
authoring m athem atics is discussed in the rem ainder of this chapter.
6.2 A Formal Structure for Document Authoring
In this section it is assumed th a t docum ents are created according to the authoring model presented in Section 6.1. The model introduced there establishes a set of steps
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
76
which can be followed during the creation of a single docum ent. This idea is now
extended by the notion th a t a docum ent may be considered as the result of a set of
modifications applied to other docum ents. This property as well as an exception to
this notion are discussed next.
The proposed docum ent structu re is based on the assum ption th a t any docum ent in its final version is seen as the result of a com position process in which interm ediate
versions of the docum ent are produced. As the au tho r’s ideas evolve and new concepts need to be included, different versions of the docum ent are generated. These versions
can be interpreted as blue prints of the au th o r’s capacity to com m unicate ideas and concepts.
Three im portan t stages related to the versions of a docum ent produced during the
authoring process are identified here. The first is the one in which the au thor makes
use of any available concept definitions. Docum ents created during this stage are
called default docum ents. A nother stage is the final. At this stage the authoring
process is over and the outcome is the final docum ent. In general, many different
versions of a docum ent are created before the final one is produced. This leads to the th ird stage, the interm ediate one, where all intermediate versions of a docum ent are
created. In a case where only one docum ent version is produced during the complete
authoring process, the default, final and interm ediate versions are the same.
At any instan t during the authoring process, the structure required to support the
creation of a particu lar version of a docum ent is the result of a process involving a set of gram m ars. Each isolated gram m ar contributes to the capture and representation
of a t least one m athem atical concept and has been included in the docum ent’s sup
porting structu re by means of one of the following three approaches. Each gram m ar
fragment either
1. has been created by standard editing procedures or
2. has already been defined or
3. has resulted from gram m ar operations.
The structu re proposed in this chapter organizes gram m ars into directories, and the com position of a directory includes definitions which have been created by any of
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
7 7
the three above mentioned approaches1. I t is assumed th a t an au thor who uses the
model proposed here will often have a set of concept definitions available which may be used during the creation of the default docum ent. The definitions should have
their location included as part of the referencing process. These locations may vary
from a local file system to the World W ide Web.
As proposed in Section 4.6, authoring environm ents are represented in term s of a
docum ent structu re and the system 's interface. The expression V = (S , I) wras used
for this purpose. In this section the docum ent structu re is defined as an organization
to support the dynam ic characteristic of docum ent authoring. This characteristic is
supported here by means of an adaptab le organization. For this reason, the docum ent
s tructu re 5 will be called the document instance structure. The following subsection
introduces a gram m ar-based structu re to model the dynamics of authoring m athematics.
6.2.1 Grammars and Dynamic Document Authoring
A document instance structure Sj, for j > 0, is a tuple
S j = (Dj ,c) (6 .1)
where c is a binding control mechanism and Dj is the semantic structure. The sub
scripted variable j determines the version of the docum ent structure considered. The
binding control c is a gram m ar the purpose of which is the provision of an environ
ment in which the sem antic structure required for the docum ent instance structure
is placed. This environment provides support for docum ent authoring behavior. For
this reason it m ust be independent of versions.
The sem antic structure Dj is a finite sequence,
= (6 .2 )
of finite sets of gram m ars.
A domain2 is a gram m ar in Gj, 1 < i < nj, such th a t it contains both the syntax and
1 Refer to Figures 6.1 and 6.2 and Table 6.1 for an overall view of the authoring mechanism.2 As discussed in Chapter 5, a CFG is considered equivalent to a data type; therefore domain and
data type are also equivalent.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
78
portions of the semantics needed to capture the meaning of a set of m athem atical
concepts. The set G\ is called domain directory or ju st directory.
For a given version of a docum ent with docum ent instance structu re as defined in
Expression 6.1, the collection of all gram m ars which can be found in th a t version of
the docum ent is called the document dictionary. This collection is defined as
« i = U c ; (6-3)t=l
Each Gj E D j is a union of three sets of types of gram m ars th a t is
G{ = N \ U F( U C l (6.4)
The gram m ars in N i are gram m ars created by standard editing mechanisms. The
gram m ars in F- represent gram m ars th a t have already been created. They are ready
to be used and satisfy the following condition:
F?= 0 if j = 0, i = 1
c u u u #* i f z> i (6'5)* = 1 k= 1
The th ird set, C l collects all gram m ars th a t are introduced by the two binary operations in the set B — {%, o}3. The set C{ is defined as follows:
C{ = {h P h ' | h, t i € {F> U N i ) A P G B } (6.6)
C hapter 7 presents a set of examples involving the organization introduced in this
section. The reason for deferring the examples is because the formalisms needed
to support the concepts introduced are provided throughout the rem ainder of this
chapter. A norm al form for CFGs is proposed in the following section.
3 The definition of these two operations require that their operands be grammars that satisfy the extension normal form criteria introduced in Section 6.3. Both the syntax and the semantics of the two operations will be defined in Section 6.4.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
79
6.3 Structuring with Grammars
The design strategy introduced in Section 5.3 suggests the use of CFG fragments to capture the semantics of the m athem atical concepts. The m odular struct ure proposed
there is based on the assignment of a unique nonterm inal to represent the meaning
of a concept. The set of productions used for the definition of a nonterm inal, in this
scheme, is viewed as the specification of a d a ta type which is represented by the gram
m ar’s s ta rt symbol4. The m athem atical constructs recognized by this organization
are, a t run-tim e, the instances of the associated types, as defined by the gram m ar5.
For this reason they are considered as objects.
As mentioned in Section 5.3 the use of norm al forms would benefit both the semantics
capturing and the gram m ar com position activities. The connection between semantics
capturing and norm al form is approached here by the definition of a set of tem plates.
These tem plates establish the restrictions which the gram m ar rules for semantics
capturing must follow.
The conventional m athem atical notation represents concepts as strings of symbols.
The in terpretation of any of these strings is based on the arrangem ent of the symbols
and the dom ain (field) in which definitions are proposed. A lthough the num ber of
operands th a t can be attached to an operator is determ ined by the concept to be represented, the location where an operator is placed inside expressions is usually lim ited to three possibilities. O perators may usually be arranged according to either
infix, prefix or postfix form ats depending on their placement relative to their operands
in the expression6. They are, therefore, considered infixed if they have both left and
right operands, prefixed in case only right operands are provided and postfixed in
situations where operands are only placed on the left side of the operator. Variations
of this scheme are necessary to support situations where the object has no operands.
A vector, for instance, illustrates this scenario since, when represented according to
4It is assumed that mathematical concepts only have the expected meaning if the domain directory in which they are defined is considered.
5 Although objects in programming languages are understood as the result of class instantiations, the interpretation attached here to them differs by the fact that the instances are not automatically generated by the language processor. In the proposed model grammars correspond to classes and the expressions are the objects defined during authoring. In this scenario objects are the result of an incremental process during which portions of the object or the complete object are provided by the author.
6 An exception to this rule is the representation of juxtaposed multiplication as used in polynomials where no explicit operator is provided.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
8 0
the s tandard notation, it is often encoded as a single lowercase le tter in bold face
type. The representation of this concept may, for instance, be viewed as either a
prefixed or a postfixed expression with no operands.
A normal form for CFGs is, therefore, proposed as a way of structuring gram m ars
to support the expression form ats discussed. The term inal symbols of the proposed
structu re are used for the representation of the operator's nam e and the nonterm i
nals are used for the representation of the operands and necessary delimiters. This
gram m ar structure also provides the necessary mechanism to support recursive def
initions since they are needed to capture the repetitive occurrences of certain types
of operators in expressions.
D e f in it io n 2 A CFG G = ( N , T , P, S) is said to be in the Extension Normal Form
(ENF) if, for all A € N , w ith a G T and a € N *, there are only four kinds of productions7 in P. They are:
(1) A —> aa
(2) A -> aa
(3) A -> A aA
(4) A -> A
T h e o re m 1 For every CFG G such that e ^ L(G) , one can construct an equivalent
CFG in Extension Normal Form.
Proof: This result follows from the super-norm al-form theorem in [71, 106].
Each production rule of the ENF may be interpreted as an atom ic gram m ar fragment.
To achieve this assume each one of the four kinds of rules, as proposed by the ENF, defines a CFG.
In Section 5.3 the correspondence between CFG and type was proposed. This indi
cates th a t the definition of a type will be a function of the num ber and the structure
defined by the gram m ar’s rules. For any given CFG rule, the com bination of te r
minals and nonterm inals determ ines the type of the rule. Rules may therefore be organized according to the num ber of term inals and nonterm inals as structured and
7G being in ENF means, G is an interpretation of a 2-symbol CFG form [106] with rules only of the types listed.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
81
non-structured. N on-structured rules define prim itive types such as integer, real and
character for example. Rules th a t cannot be associated w ith any type are also con
sidered non-structured. S tructured rules define types and the m eaning of a type may
depend on inform ation provided by other rules.
The following definitions impose restrictions on gram m ars in ENF as a way to clas
sify these gram m ars according to the criteria of being structured or non-structured. The resulting gram m ar fragments are the building blocks which will be used for the
semantics capturing process.
Definition 3 A CFG G = ( N , T , P, S) in ENF is called an operatorless gram m ar if N = {S', £?}, T = {} and P = { S —» B } . The rule S —i B is called an operatorless8
production.
O peratorless gram m ars are used to introduce specializations. T hat is, a concept asso
ciated with S is specialized to B. Any instan tia tion of B is therefore an instantiation of S.
D e f in it io n 4 A CFG G = ( N , T , P, S) in EN F is called a primit ive gram m ar if
N = {S}, T = {a} and P = { S a}. The rule S —> a is called a prim itive
production.
Prim itive gram m ars introduce atom ic types. T hat is, the type assigned to its nonterminal does not depend on the type associated with any other nonterm inal.
D e f in it io n 5 A CFG G = ( N , T , P, S) in EN F is called a basic gram m ar if for
a £ ( N U T ) + its set of rules is P — {5 —> o}. The rule S —> a is called a basic
production and it is neither a prim itive production nor an operatorless production.
Basic gram m ars are type constructors. They are used to create com posite types. In
this case the type assigned to its s ta rt symbol will depend on the types associated
with the other nonterm inals th a t are part of the rule.
O peratorless, prim itive and basic gram m ars are the essential com ponents which will
be involved in the semantics capturing activity. For this reason they will be referred to as fundamental gramm ars.
8This type of production is often called unit production [55, 107, 69].
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
8 2
Definition 6 All CFGs in ENF which are neither operatorless, prim itive nor basic are called derived gram m ars. Derived gram m ars which have no operatorless productions
are called reduced gramm ars.
The following example illustrates the notion of basic gram m ar. Consider, for instance,
gram m ars
. Gi = ( N u T u P ^ S t ) w ith N , = { S U B , C } , 7 \ = {a}, P l = { S l a B C }
• G-i = (Ari,T i, P 2, S i) with P 2 = {*51 —> B C a } and
• ^ 3 = (-^i) T\i P 31 S i) with P 3 = {5i a B C a }
G 1 and G2 are both basic gram m ars. G 3 is not basic since it is not in ENF.
6.3.1 M athematical Concepts and Grammatical Dependencies
The fact th a t the definition of a m athem atical concept usually depends on other
concepts indicates the existence of relationships am ong them . This characteristic is,
in this thesis, interpreted as a dependency relation where one concept is the dependent
and a set of others the determ inants. In this work this relation is represented by an
arrow th a t s ta rts a t the set of determ inants and points to the dependent. The two
following examples illustrate this notion of dependency.
The concept of absolute value is defined as an operator th a t returns the unsigned version of the expression supplied as its argum ent. According to conventional no ta
tion, | s inx | is the encoding for the absolute value of the sine function computed for
argument x. Assuming A and S represent the absolute value and the sine function
respectively, the relation between these two concepts, in the context of |s in x |, is
indicated as follows:
A <$= S
The presented dependency relation establishes a hierarchical relationship th a t has A
as the parent of its only child, the S construct. In this case S is the determ inant and A the dependent.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
8 3
The concatenation of strings of characters is often encoded as expressions in the infix
form at with the + symbol representing the operation on the operands [107]. An alter
native encoding is also used when a single string of characters is to be concatenated with itself. In this case the operation may also be encoded as a power expression
where the expected result is the original string concatenated with itself the num ber of times indicated by the exponent. The two encodings are illustrated by the equality expression
a * 3 + b = aaab
where the symbols *, + and = represent the power, concatenation and equality operations respectively.
ca tE qual i t y expr expr EQUALS term .catexpr te rm .catterm_cat te rm .cat CONCATENATION termterm .cat termterm term PO W ER factorterm term _stringterm .string STRINGfactor IN TEG ER
Table 6.2: CFG for equality of strings of characters.
The operations and operands involved in this expression are described in the gram m ar
represented by the set of production rules defined in Table 6.2. The hierarchy imposed
by the rules of gram m ar catEquality9 establishes the following seven dependency relations:
EQUALSEQUALS
EQUALS
CONCATENATION
CONCATENATION
PO W ERPO W ER
CONCATENATIONPO W ER
STRING
STRINGPO W ER
STRINGIN TEG ER
9It has been assumed that STRING is defined by the regular expression [a-j]" and INTEGER is a nonzero positive integer.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
8 4
The above relations determ ine the dependencies which exist am ong the m athem atical concepts which have been used for the definition of another concept. In this thesis they
will be called terminal dependencies10 because of the one-to-one association between the nam e of the m athem atical concept and the term inal symbol wrhich represents it
in the gram m ar which captures the m eaning of the concept.
scheme -> ID { dexpr }dexpr dexpr restdexpr —> IDrest —> <= detdet —> IDdet ( dlist ) moredet -> dexprdlist -> first othersfirst -> IDothers -> , objectobject -> IDobject —> dlistmore emore —> ; dexpr
Table 6.3: CFG for representation of schemes.
The gram m ar in Table 6.3 provides the syntax which will be followed, in this thesis,
to represent term inal dependency relations. Each word belonging to this gram m ar is
called a representation scheme.
Since representation schemes are always related to gram m ars, they will be identified
by the gram m ar's nam e appended with the literal string Scheme. The expression
which follows determ ines th a t catEquality Scheme is the representation scheme for the gram m ar defined in Table 6.2.
cat .E qual i t ySchem e{E quals <= (Concatenat ion , P o w e r , S t r i n g );
Conca tenat ion (S t r i n g , Power):
P o w e r <= (Integer, S t r in g ) }
10Although the formal definition of terminal dependency is provided at the end of this section, these dependencies can easily be identified whenever the related grammar is expressed as a reduced grammar.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
8 5
Although the ENF determ ines the possible arrangem ent of nonterm inals and term i
nals in production rules, this mechanism is not adequate for the description of the
relationships which exist am ong the concepts represented. This lim itation is a con
sequence of the fact th a t in a CFG the nonterm inals are variables and the term inal
symbols are constants. This implies th a t the existing relationships among concepts can only be expressed in term s of the ir associated nonterm inals when represented by
CFGs. The notion of gram m atical dependency is introduced as a form of describing
the restrictions a set of gram m ars should satisfy whenever their term inal symbols
express a dependency relationship.
Prior to the definition of term inal dependency the notion of type decom position 1 1
needs to be introduced. This means a CFG is decomposed as a set of gram m ar
fragments which can be tested for nonterm inal relationships. The existence of rela
tionships am ong nonterm inals in different gram m ar fragments leads to the notion of
gram m ar dependency. These ideas are formally presented by the following definitions.
Definition 7 Let L = ( N ,T , P, S) be a reduced gram m ar. The type decomposition
of L is the set
Z t = { K p = (N p, Tp, Pp, Sp) | Pp = {p}, Np = (Lp U Rp), Tp = Op, Sp = Lp, p e P }
D efinition 8 Let Gb = (Nb, Tf,, Pb, Si,) be a basic gram m ar and let G 0 = (AT0, T0, P0, S0)
be either a basic or a prim itive gram m ar such th a t Gb ^ G a. Gb is gram m atically
dependent on G 0, G b <= G 0, if L 0 n R b ^ 0.
Ei expr expr MINUS termEi expr term TIM ES factore 3 expr ID EN TIFIERE 4 term term DIVIDEDBY factorEs term ID EN TIFIE REf, factor ID EN TIFIE R
Table 6.4: G ram m ar fragments illustrating gram m ar dependencies.
11 Since the start symbol of a grammar is interpreted as a type, the decomposition of a grammar as a set of grammar fragments is viewed here as a type decomposition.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
8 6
To illustrate the gram m ar dependency concept, consider the gram m ars defined in
Table 6.4 th a t may be viewed as the result of the type decomposition of some gram
mar. For this exam ple Z = [ E i , E 2, E 3, E 4, E 5, E 6}. The following dependencies are
obtained by applying Definition 8 to the gram m ars defined by the productions in Table 6.4.
Ei <t= (E 2 , E 3 , E 4 , E h) (6.7)
E 2 <= (E\ , E 3, E 4, E 3, Ee) (6.8)
E 4 <= (E 5 , E 6) (6.9)
As can be seen, commas and both opening and closing parentheses have been used
for the representation of dependencies. These symbols wrere included in order to
group all determ inants th a t are associated w ith a dependent in a single dependency
expression. The gram m atical dependencies determ ined by the dependency relation
(6.7), for instance, states th a t portions of both the syntax and the semantics described
by gram m ar E x are supplied by gram m ars E 2, E 3, E 4 and E 5.
W hen capturing the semantics of m athem atical concepts by a set of gram m ar frag
ments, the names of the concepts are represented bv the term inals of the gram m ars involved. The resulting gram m atical dependencies which exist am ong gram m ars in
this set can be expressed by means of the term inal symbols of these gram m ars. As
stated before, these relationships are called term inal dependencies and they are formally defined in term s of gram m atical dependencies as follows.
Definition 9 Let G b = (N b, Tb, Pb, S b) and G 0 = (N 0 , T0 , P 0, S0) be CFGs. W hen the
gram m atical dependency G b <= G a is satisfied we also say th a t there is a term inal
dependency for each pair of term inals x, y such th a t x G Tb and y £ T0. The syntax
x <= y is used to express a term inal dependency between term inals x and y. For this case term inal x is called the dependent and term inal y the determ inant.
The collection of all term inal dependencies which can be determ ined from a type
decom position is called a dependency scheme. A representation scheme is the syntac
tical structure which is used to list all term inal dependencies found in a dependency scheme.
Dependency schemes can be used as an aid to help with the identification of redun
dancies by syntax equivalence. Reduced gram m ars which do not share dependency
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
8 7
schemes are free of redundancies by syntax equivalence.
6.4 Grammar Operations and Extensibility
This section introduces two operations. These operations have gram m ars as both their
input param eters as well as their returned inform ation. Input gram m ars are seen as
providers of both syntax and semantics and they are never modified by the operations.
The ou tpu t produced is the result of the com bination of the production rules supplied
by the input. Since the application of any of the two proposed operations produces
a single gram m ar, the creation of more complex gram m ar definitions may be seen as
the result of a sequence of operations which would use the inform ation obtained from previous operations. Therefore the creation of the final gram m ar may be viewed as
the result of a process where gram m ar fragments have been inserted a n d /o r deleted.
Both operations are defined for input gram m ars in ENF. This requirem ent guarantees
th a t the ou tpu t gram m ar is also in ENF. In this thesis these operations are the
means by which gram m ars are combined in order to support the extensibility of the
m athem atical notation.
The use of CFGs as a supporting organization to capture the m eaning of m athem atical
concepts, as previously proposed in this work, is restricted to docum ent structures
which can only be modified by editing mechanisms. This lim itation was discussed in Section 5.3 where the correspondence between CFG and type was presented.
The need to either overload a given symbol by attach ing a different meaning to it,
or to introduce a new syntactical representation for a m athem atical concept may be
viewed as modifications to be executed on gram m ars which have already been defined.
A nother approach to this need is to generate gram m ars to support the mentioned
requirem ents by reusing, whenever possible, the available gram m ars. The notion of
gram m ar reuse as defined by the two operations proposed here is considered one of
the fundam ental m echanisms 1 2 which this thesis introduces to approach the semantics
capturing necessity. For this reason both operations do not modify gram m ars which
have already been created. Instead they support the semantics capturing activity by
12Another important mechanism is the notion of context switching or scope. This notion is introduced in this chapter to support symbol overloading.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
88
allowing gram m ars to be created by reusing inform ation provided by other gram m ars.
The following definitions introduce these operations:
D e f in it io n 10 Let G b = ( N b,T b, Pb, S b) and G 0 = (N 0 , T o, P0, S0) be two CFGs in
ENF. The composition operation G b o G 0 will produce a CFG G c = (Nc, Tc, Pc. S c) as follows:
Pc = Pb U P0
N c = N b u N 0
Tc = Tb U T0
Sc = Sb
D efinition 11 Let G b = ( N b,T b, P b, S b) and G 0 = {N 0 , T 0, P0, S0) be two CFGs in
ENF. The extension operation G b % G 0 will produce a CFG G x = (Nx, Tx, Px, Sx), as
follows:
Px = { 4 —y (a | A —y o G Pb A A $ N 0} U P0
N x = N b U N 0
Tx = Tb U T0
Sx = S b
W hile the com position operation is left-associative and com m utative the extension
operation is left-associative, but not com m utative. To illustrate the use of both
the com position and extension operators, consider the need to capture the m eaning
of expressions consisting of the addition of numbers. For this purpose assume th a t
gram m ar fragments G 2, G 4, Ge and Gg are available. This means these fragments have already been created by editing procedures and have been stored in some com puter- based device.
G 2 expr expr PLUS term
Table 6.5: Basic gram m ar for addition.
Table 6.5 displays the basic gram m ar G 2 which captures the semantics of expressions consisting of two operands connected by the infixed PLUS operator. Table 6 . 6 and
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
8 9
expr term
Table 6 .6 : O peratorless gram m ar linking expr and term nonterm inals.
term : num
Table 6.7: O peratorless gram m ar linking term and num nonterm inals.
Table 6.7 contain the definitions for operatorless gram m ars G 4 and G 6 respectively.
Table 6 . 8 displays the prim itive gram m ar G 8. The gram m ar to capture the semantics
of addition expressions involving numbers may be obtained by the com bination of
these four gram m ar fragments as provided by the expression
{{G 2 0 G^ } £ ? 2 4 o Gg o G 8 }Gr i .
The notation { G 2 o G i } G 2 4 is used to express the fact th a t the result of the compo
sition operation G 2 0 G 4 has the nam e G 2 4 . In a similar way G ri is the nam e it has
been assigned to the com position operation G 2 4 o Gg 0 G 8. The derived gram m ars G 2 4
and Gr, are displayed in Table 6.9 and Table 6.10 respectively.
Consider now capturing the m eaning of expressions involving both the addition and
the m ultiplication of numbers. One way to approach this problem is to make use of the gram m ars which have already been defined. Additional gram m ars necessary
to capture the concepts not covered by these gram m ars may be obtained, say, for
example, by editing.
Assume gram m ar Gn is already available. Also assume gram m ars G j, G 3 and G 5 have
been created by editing. Table 6.11 displays the basic gram m ar G i. The operatorless
gram m ars G 3 and G 5 are displayed in Table 6.12 and Table 6.13 respectively. The
gram m ar to capture the semantics of expressions involving the m ultiplication and addition of numbers is therefore obtained by means of the expression
{G>] % { G \ o G 3 }G i3 o G 5 }Gr2.
The notation Gn % { G 1 o G 3 }G i3 is used to indicate th a t G ri is extended by G i 3 which
is the result of the com position operation Gi o G 3. The name G T2 is therefore assigned
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
9 0
num : NUM BER
Table 6 .8 : Prim itive gram m ar setting nonterm inal num to term inal NUM BER
G 2 4 expr expr PLUS termexpr term
Table 6.9: Derived gram m ar for addition.
to the result of the operation {Gri % G i 3 o G 5}. The derived gram m ars G i 3 and G r2
are displayed in Table 6.14 and Table 6.15 respectively. A simple gram m ar fragment
to deal w ith the usage of both extension and composition operations is presented in
Table 6.16. According to this gram m ar the result of the binary operation(s) may
either be saved as a new gram m ar or not. This is a consequence of the fact tha t
the nonterm inal new_class may be replaced by the term inal ID EN TIFIE R or by the
em pty string e. Therefore whenever variable new_class is replaced by the em pty string
the result of the binary operation(s) will not be remembered. A lthough there is no
means of reusing the result produced, the procedure does generate a gram m ar. In
this thesis, this gram m ar is called an implied gram m ar.
The notion of implied gram m ar introduces the possibility of defining domains w ithout
adding gram m ars to the domain directory. These types of domains exist only during
run-tim e and are called implied domains.
6.5 Structuring with Domains and Directories
Section 6.4 presented a structured process to capture the m eaning of m athem ati
cal concepts. The approach introduced the notion of atom ic gram m ar fragments
and the notion of c rea te /u p d a te CFGs by means of two binary operations. Instead
of concentrating the needed knowledge in a m onolithic gram m atical organization,
this process d istributes the required inform ation among a set of gram m ar fragments.
These fragm ents are therefore viewed as decentralized structures which decompose m athem atical concepts according to CFGs which are either basic, prim itive or oper
atorless. The d istributed fragments may be combined by the binary operations as a
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
91
Gri expr expr PLUS termexpr termterm numnum NUM BER
Table 6.10: Resulting gram m ar for expressions involving addition.
G i term : term TIM ES factor
Table 6 .1 1 : Basic gram m ar for m ultiplication.
way of generating other gram m ar fragments. The set of gram m ar fragments are, in
this way, updated in an increm ental style by module reuse. As described so far, the
solution supports extensibility from a restricted point of view since it does not con
sider the m ulti-dom ain aspect of m athem atics. Instead it assumes th a t all concepts
to be represented belong to a single domain.
The proposed approach allows the possibility of considering gram m ar fragm ents as
both open and closed concepts. The fact th a t they may be used to represent unique
inform ation which may be stored as com ponents of a library and used by clients
of the library, characterizes them as closed concepts. On the other hand the same
fragments may contain inform ation which may be used for the creation of a new
gram m ar fragment by means of the two binary operations. For this reason they may
also be considered as open concepts. This in terpretation is consistent with the notion
of object-oriented class as provided by [72]. According to this in terpreta tion CFGs
correspond to classes. Therefore for a given CFG, say, for example G = (N , T , P, S ), the words in L(G) will inform ally 1 3 correspond to instances of the class associated
w ith G. M athem atical expressions will consequently be the objects. Operatorless
gram m ars and CFGs which have only operatorless productions are an exception to
this because they have no means to express any concrete objects, and therefore cannot generate m athem atical expressions.
13 This association is loose because some fundamental characteristics of classes cannot be expressed as grammar operations. Consider, for instance, the notion of subclass. This concept does not always correspond to grammars which result from either the extension or the composition operations.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
9 2
G> term : factor
Table 6.12: Operatorless gram m ar linking term and factor nonterm inals.
G 5 factor : num
Table 6.13: Operatorless gram m ar linking factor and num nonterm inals.
6.5.1 Domains, Directories and Symbol Overloading
The approach presented in the previous sections does not properly address document organizations containing symbol arrangem ents which have been used to express con
cepts which belong to more than a single m athem atical field. In order to extend
the proposed process a relation between symbol overloading14 and dom ain/directory
needs to be established. The solution proposed in this subsection approaches symbol
overloading by means of a real-tim e u p d a te15 process. This process is the mechanism
by which the structu re of a docum ent adapts in order to cope with representation
am biguities introduced as the result of overloading.
For any given directory the solution determ ines th a t the overloading is resolved by
means of a dynam ic directory change. This implies th a t the m eaning of symbol ar
rangem ents is a function of the directory in which they are defined. The dynam ic
characteristic is required to support the possibility of user-defined syntax to be in
troduced during authoring. A directory therefore defines a scope and the symbol
overloading determ ines the need for a change of scope or context switch16. The ap
proach supports this requirement by introducing the notion th a t any twro semantically
distinct m athem atical concepts which have been assigned the same arrangem ent of
symbols for the ir syntactical representation are considered here to have an overloaded
14One relevant aspect of the dynamical authoring of mathematics is the fact that the overloading of symbols is at the author's discretion. This characteristic is nondeterministic. therefore it cannot be predicted.
15 In the context of this thesis, real-time update is used to refer to the document modifications done during the document authoring activity.
16It is important to note that, in this scenario, both the number and contents of domains are under complete control of the author of the document. This indicates that domain is a dynamic concept. This point of view has been formally stated in Section 6.2.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
9 3
G 13 term term TIM ES factorterm factor
Table 6.14: Derived gram m ar for m ultiplication.
G T2 expr expr PLUS termexpr termterm term TIM ES factorterm factorfactor numnum NUM BER
Table 6.15: Resulting gram m ar for expressions involving addition and m ultiplication.
representation. Therefore the representation of a m athem atical concept is considered
non-overloaded if there is a one-to-one relationship between the concept and symbol
arrangem ent used in its syntactical definition. This idea is formally stated bv the following definitions:
Definition 12 Let S be an alphabet and C be a nonem pty finite set of m athem atical
concepts. The representation of a m athem atical concept is a m apping from C to S + .
Definition 13 The representation of a m athem atical concept is overloaded if the
m apping from C to S + is not injective.
By structuring concept definitions into domains and directories it becomes necessary
to establish the conditions under which existing gram m ars could be applied to the construction of a directory. To ensure th a t a directory is free of am biguities the
restriction proposed by the following definition needs to be observed.
Definition 14 A domain directory is overloaded if the representation of some m ath em atical concept in the directory is overloaded.
The notion of non-overloaded directories is useful in this context because each over
loaded concept representation which needs to be included in a docum ent determines
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
9 4
s tm tJ is t s tm tJ is t ; s tm t | stm tstm t { class } new.classclass —> class operator other.classclass —> other.classoperator % | oother_class stm t | ID EN TIFIERnew .class -> ID EN TIFIE R | e
Table 6.16: G ram m ar to support the use of both the composition and extension operators.
the need for a separate directory. This means the concept representation forces the
existence of an organization in which its m eaning is uniquely defined. This approach
has the advantage of considering the m ulti-directory characteristic as a supporting
mechanism for the solution of the symbol overloading necessity.
Semantics m odularity is achieved when the many-to-one m apping between concept
and representation is restricted to non-overloaded domain directories. The resulting
docum ent created, once the complete authoring process is over, will have its contents
naturally organized according to the meanings of the concepts involved.
6.6 Languages as Control Structures
The concept of non-overloaded representation establishes the necessity of a directory switch mechanism as a way to adap t to symbol overloading. This introduces addi
tional complexity to the strategy chosen for processing the inform ation provided by
directories. For this reason the complexity of language processors designed to address
this characteristic increases with the num ber of domains introduced.
As gram m ar fragments, directories are both open and closed concepts. They are open
because they are dynam ic and allow m odification17 to take place. As a closed concept
they represent unique inform ation. E ither as a physical library component or as the
result of operations performed on its underlying set of gram m ars, a directory exists
17No dynamic modification is ever allowed to a domain/directory as the result of the composition and/or extension operations. However both domains and directories may be modified by editing.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
9 5
as a single CFG. Since a directory will u ltim ately be represented by its CFG, there exits a language processor18 associated with its gram m ar.
Although a directory which comprises part of the logical structu re of a docum ent need
not have any direct dependency with the others, they all share a common s tru c tu re19
where all m athem atical inform ation of the docum ent is presented. This requirement
establishes th a t a form of synchronization is necessary to guarantee th a t the next
piece of d a ta to be processed will be dealt with by its associated processor.
The arrangem ent by which the m athem atical concepts are organized throughout a
docum ent is a user-defined task which takes place during the authoring process. It is during this phase th a t the au thor specifies both the syntax as well as the true
meaning of operations by binding concepts to syntax and collecting them into related
domains and directories. The structu re of the docum ent, a t any tim e during this
process, will therefore reflect the way these directories are arranged. There are three
possible ways directories may be composed. A docum ent structu re is the result of
directories arranged in one of the following Directory Composit ion Forms :
• Pure linear,
• pure hierarchical or
• combined form, a com bination of linear and hierarchical.
In a pure linear organization, directories are self contained. This means there is only
a single scope where objects are delim ited. Directories organized in this way may
be processed in a F irst-In F irst-O ut(F IFO ) fashion. In a pure hierarchical organiza
tion, directories are processed in a Last-In F irst-O ut(LIFO ) style. The semantics in
these types of docum ents are structured in a nested way such th a t only the inner
most directory has no dependency with the others. The m ost common structure is
the combined one which is characterized by a random pattern of FIFO and LIFO
organizations. This case may be considered general as it contains the previous two.
For this arrangem ent, the possible num ber of docum ent structu re patterns which can
18This characteristic is supported by the fact that for every CFG there exists a Pushdown Automaton that recognizes the language [55, 107, 69].
19 Even though text-based forms of representation are expected to be used in most applications, the ideas presented here also apply to other input formats.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
9 6
be obtained for a given num ber of directories is provided by the nonlinear recursive
expressionn
Pn = Y,PiPn-it=l
for
Pn P n —1? P o 1
where n is the to ta l num ber of d istinct directories. This means th a t for a given
docum ent which requires, for example, 6 distinct directories, 132 docum ent structure
patterns can be obtained. Therefore a small number of language processors can
be rearranged in many different ways as a form of supporting docum ent updates20.
This indicates th a t, once the associated language processors have been generated, all docum ent modifications, which do not involve the addition of new directories during the authoring process, will depend only on the synchronization procedure needed for the generation of the corresponding hierarchical interm ediate representation21.
The notion of directories as a s tructu re to support the specification of the syntax
and portions of the semantics of m athem atical concepts has been introduced in the
previous sections. It is intuitive th a t directories m ust only be involved during the
authoring process if there exists a t least one concept th a t needs to be represented. On
the other hand, no m athem atical construct may be m anipulated during the authoring
process w ithout the clear indication of where its related structure has been defined. These concerns are summarized by the following definition:
D e f in it io n 15 Let M be a finite set of m athem atical concepts. The sem antic struc
ture of a docum ent Dj involving m athem atical concepts M is considered irredundant
if for each directory G j in D j there exists a m athem atical concept m G M such tha t m is represented in the scope of GK
Two characteristics which relate to the way directories take part during the organization of m athem atical inform ation in docum ents have been presented. In an informal
way they sta te th a t the semantics of a docum ent is defined by means of a set of directories and each of these directories m ust contain a t least one object in it. These
20 It is understood that these updates do not require the addition of new directories.21 This characteristic is of course subject to storage requirements. The choice of either keeping the
language processors in main or secondary storage is an implementation decision.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
9 7
characteristics constitu te the fundam ental requirements th a t need to be considered
for the elaboration of a mechanism to determ ine the way directories are m anipulated
during the authoring and processing phases. This mechanism must be flexible enough
to provide the au thor with the freedom to configure versions of the docum ent by s tan
dard docum ent operations such as insertion and deletion. Each docum ent version is
therefore the result of a set of operations which may have changed the docum ent's
internal structure, modified the contents of the docum ent, or both. Modifications af
fecting only the contents of docum ents by either including or removing objects wdiich
belong to the set of directories currently defined for the structu re of the docum ent
have no further im portance besides an increase or reduction in the am ount of infor
m ation th a t is to be supplied to a Tenderer. On the other hand, any modification
which affects the docum ent’s logical s tructu re would need to be executed under a stable form of control.
6.6.1 Directory Composition Example
Small fragments of docum ents containing simple expressions th a t overload the + symbol are provided to illustrate the notion of directory composition. The syntactical
s tructu re as defined by the production rule
directoryscope —> { directory-definition ) block-objects ( /)
is used to delim it the scope of a directory where the strings of letters are nonterm i
nals and the symbols () and / are term inals. Nonterminals directory-definition and
block-objects are gram m ar variables th a t have been used to represent a directory and
the m athem atical constructs included in the block respectively.
Dl.O( Expression )
1 + 1 + 0 = 2 1 + 1 + 1 = 3
1 + 1 + 0 = 110
1 + 1 + 0 = true
(/)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
9 8
The docum ent fragment Dl.O, as above illustrated, is characterized by a monolithic
organization where the definition of all m athem atical objects included in the docu
ment are placed in the Expression directory. The fact th a t the + symbol has been
used in the above example to represent different m athem atical concepts, characterizes this one-directory docum ent fragment as overloaded. A voice Tenderer system, for instance, will not be capable of providing the appropriate m eaning th a t has been
attached to the -I- symbol in each of the four expressions. This is because the repre
sentation used assumes tha t
• only visual-based views are necessary, and
• the reader has the required knowledge to decode the different meanings assigned
to the + symbol.
The above problem is approached here by dividing the single directory into three
separate ones in order to ensure th a t the directory is not overloaded. A directory-
based organization is consequently obtained. The resulting docum ent organization, as
shown below, has therefore been structured according to the addition, concatenation
and disjunction operations th a t have been attached to the + symbol.
D l . l( Addit ion )
1 + 1 + 0 = 2 1 + 1 + 1 = 3
(/)( Concatenat ion )
1 + 1 + 0 = 110
(/)( Disjunction )
1 + 1 + 0 = true
(/)
The docum ent organization D l . l differs from organization Dl.O bv the fact tha t
the im plicit knowledge needed to distinguish syntactically identical operations with
different meanings, as provided by Dl.O, has been replaced by the three distinct directories. This means the task associated with decoding to resolve am biguities th a t
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
9 9
was previously left to the user, has now been assigned to the au thor of the docu
ment. Therefore, besides outlining the m athem atical concepts used in the docum ent,
the au thor is also responsible for specifying the directory in which the syntax and
semantics of these concepts is defined.
The docum ent fragm ent D l.2 as shown below, is an example of a docum ent structure
where combined directory com position is used. A lthough the m athem atical objects
involved are the same as the ones in the previous two versions, this organization differs
from the other two by the way the directories have been arranged.
D l.2( Addition )
1 + 1 + 0 = 2
( Concatenat ion )
1 + 1 + 0 = 110
(/)( Disjunction )
1 + 1 + 0 = true
(/)1 + 1 + 1 = 3
(/)( Disjunction )
1 + 1 + 0 = true
(/)
6.6.2 The Control Mechanism
1 directory .scope -> < directory-definition > block.objects < / >2 directory-definition -> ID EN TIFIE R | s tm tJ is t3 block_objects various_exprs scope.change4 various_exprs various.exprs ; new .expr | new_expr5 scope.change -> directory .scope various.exprs | directory .scope | e
Table 6.17: CFG for the binding control mechanism.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
100
The docum ent organization provided by the three examples from the previous subsec
tion illustrates th a t a form of control is necessary in order to ensure the correctness
of the directory composition forms. This requirem ent has been introduced in Sec
tion 6.2 as c, the binding control. As part of the definition of a docum ent instance
structure Sj — (Dj ,c) , the binding control is a CFG. A possible definition of c to
support the directory composition forms is provided in Table 6.17. The nonterm inal
s tm tJ is t is defined in the gram m ar fragm ent described in Table 6.16 and the nonter
minal new_expr is only to be defined whenever directories are created. This means
any CFG which defines a directory will have new_expr as a s ta rt symbol.
6.7 The Role of Compilers
The organization imposed by the dynam ic authoring model allows the au thor of a
docum ent the possibility to modify both the syntax and semantics of the notation.
Therefore modifications proposed a t the abstract level, by the author, must always be
supported by the docum ent processing environm ent. This requires th a t if gram m ar definitions need to be modified, the corresponding language processor will need to be
created to process the new version of the language22. Therefore different language
processors might need to be produced during the authoring process.
To approach the semantics capturing of m athem atical concepts as proposed in Sec
tion 5.3, the organization of m athem atics is viewed as a set of fields. According to
this strategy all concepts th a t belong to a field can be captured by a directory and therefore require the support of an adequate language processor.
In a general scenario, docum ents often involve inform ation th a t belongs to more than
a single domain. For this reason the notion of directory as a collection of domains
was introduced. A processing structu re to support this arrangem ent would dem and as
many language processors as the num ber of directories necessary to cover the s ta te of
knowledge addressed by the docum ent. Therefore the number of language processors
to support the dynam ic authoring model will always be greater than two23 if the
22It is assumed that one directory may be composed of a set of grammar fragments.23The propsed document structure supports the directory swap strategy" by means of a CFG, the
binding control. For this reason at least two language processors are required. Consequently at least one additional processor will be needed to process the objects included in the document. This organization forces the number of processors to be one greater than the number of directories in the document.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
101
docum ent to be processed involves inform ation defined in more than one domain.
Different language processors will take over the processing activity at selected parts
of the docum ent. Each processor is viewed as an agent tha t has knowledge to validate
the syntax of its m athem atical objects and perform other tasks as determ ined by the
semantics of the objects.
Letting the num ber of directories in a docum ent be a param eter under the control
of the au thor indicates th a t an equal num ber of language processors will need to be
provided in order to support each required directory. To meet this requirem ent, this
thesis proposes th a t language processors be dynam ically created by the software used during the authoring activity.
The au tom atic creation of language processors based on the knowledge provided by CFGs requires inform ation about the position of both term inals and nonterm inals.
A lthough the gram m ar structure imposed by the ENF determ ines th a t a t most one
term inal is perm itted in production rules, the num ber of nonterm inals is left unre
stricted. One exception to this is the operatorless production which is always composed of one nonterm inal.
Representing the m eaning of m athem atical concepts by means of CFGs requires th a t
all inform ation which is part of the concept has to be m apped to the set of production
rules. This includes the set of symbols used for the representation of the nam e of the
concept, its a ttribu tes and delimiters.
Having the name of the concept as a term inal and both its a ttribu tes and delim iters
represented as nonterm inals introduces the need for an additional mechanism in order
to distinguish a ttribu tes from delimiters. For this purpose, a set of a ttribu tes is added
to the gram m ar rules.
As an extension to the gram m ar structu re already proposed for capturing semantics, these a ttribu tes will also be applied to the definition of the term inal symbols. The
attachm ent of a ttribu tes to the rules of CFGs was proposed by K nuth [63, 78]. The
resulting gram m ar is called an a ttrib u ted gram m ar.
The use of a ttribu ted gram m ars to support the semantics capturing of m athem atical
concepts does not require any modification to the approach already presented. Both the com position and extension operators can also be applied to a ttribu ted gram m ars.
The following definition presents this characteristic:
D e f in it io n 16 Let Gi — (A i, Tj, P i, S i, A , « i) and G i = (iV2, T2, P2, S2, A , a 2) be two
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
102
a ttribu ted context-free gram m ars. The extension operation Gi % G 2 will produce an
a ttribu ted context-free gram m ar G 3 = (N 3 ,T S, P 3, S3, A, a 3) defined as follows:
• For the underlying context-free gram m ars one has G 3 = G\ % G 2 where G,
denotes Gi w ithout a ttribu tes.
• The m apping a 3 is given by
a ttribu ted context-free gramm ars. The composition operation G\ o G 2 will produce
an a ttribu ted context-free gram m ar G 3 = ( N 3, T3, P 3, S 3 , A , q 3) defined as follows:
• For the underlying context-free gram m ars one has G3 = G\ o G 2.
• The m apping a 3 is given by
( a i ( A - > w ) , if i 4 -> it'G Pi, a 3(A -> w = <
| ct2 (A —> w), if A —► w G P 2 or '4 —> it G P i fi P 2
for every rule A w € P 3.
The following section proposes the s tructu re of the gram m ars which will be used to
support the definition of the domains. This m eta-gram m ar is therefore the tem plate
which will be applied during the creation of every gram m ar fragment required to
capture the meaning of m athem atical concepts.
In this section a ttribu ted CFGs are used as an aid to specifying the semantics cap tu ring of m athem atical concepts. Synthesized a ttribu tes [6] are attached to production
rules which are either prim itive or basic. These a ttribu tes supply additional semantics
(6 .10 )
for every rule A —> w € P 3.
D e f in it io n 17 Let G i = P l5 Si, A , a^) and G i = (N2, T2, P2, S 2, A , a 2) be two
6.8 M eta-Structure
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
103
of the concepts involved th a t have been om itted due to lim itations of the CFG part of
the structure. The a ttribu tes are represented as gram m ar variables regular_expr and
cardinality as defined in Table 6.18. The cardinality variable holds the position of
the argum ents th a t are associated with the term inal symbol of the rule. Nonterm inal
cardinality is e for operatorless and prim itive productions because they both have no
argum ents. For basic productions cardinality is defined in term s of the args.position
nonterm inal. In this case the position of the argum ent is identified bv a positive integer greater than zero. The nonterm inal regular_expr is used to represent regular
expressions th a t describe the symbol arrangem ent applied to the com position of te r
minals. Regular expressions used by this gram m ar follows both syntax and semantics
defined by lex [66].
m eta cfg a ttribu tescfg -> NONTERM INAL : itemsitems items itemitems itemitem —> TERM INAL | NONTERM INALattribu tes —> # regular_expr # cardinality | ecardinality —> ( args.position ) | eargs.position -> args_position , position | positionposition —> IN TEG ER
Table 6.18: P roduction rules for the m eta-gram m ar.
The m eta-gram m ar part of PNS is defined by the set of production rules shown in
Table 6.18. The proposed gram m ar defines nonterm inal m eta as the s ta rt symbol which structures the problem according to the two nonterm inals on the right side of
the rule. The part s tarting with the nonterm inal cfg defines the structu re of the rules
in the CFG part of the structure. N onterm inal a ttribu tes, proposes the s tructu re for
the synthesized a ttribu tes.
Table 6.19 illustrates the organization proposed by the m eta-structure. This example
shows the EN F version of the gram m ar displayed in Table 5.16 with the corresponding set of a ttribu tes attached to each production rule. P roductions 2, 3 and 8 are
operatorless productions, therefore they have no attribu tes. P roductions 9 to 12 are prim itive productions. For this reason they have only the regular expression
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
104
1 identity _expr sample_expr EQ sample_expr # " = " # (U3)2 sample_expr expr3 sam ple.expr sum4 sum SUM left_del sum .elm ts right.del # vSumv # (3)5 sum_elmts rangeJist SUMDEL sam ple.expr # v;” # ( T 3 )6 rangeJist s ta r t LISTDEL end # " , " # ( 1 , 3 )7 s ta rt identifier ITERATIO N expr # ' # (1,3)8 end expr9 identifier ID EN TIFIE R # [a~z]+ #10 leftDel LEFTDEL #••{•• #11 right Del RIG HTDEL # T #12 expr IN TEG ER # [1-9] [0-9]* #
Table 6.19: A ttribu ted gram m ar to support the capturing of simple sum m ations,
a ttribu tes.
6.9 Conclusion
In this chapter I have presented a gram m ar-based docum ent organization to cap
ture the m eaning of m athem atical concepts. The approach models the dynam ics of
authoring m athem atics and supports the introduction of user-defined syntax to rep
resent m athem atical concepts. This means, the semantics of m athem atical concepts
included in the docum ent can be bound to syntax proposed during authoring. These
ideas are expressed in term s of the Docum ent Description Model described as follows.
A Docum ent Description Model (DDM) is a structure composed of
1. a docum ent dictionary H 3 such th a t all gram m ars in this set are in ENF, and
2. the following operations:
(a) G ram m ar Creation: introduced in section 6.4 by the composition operato r o.
(b) G ram m ar U pdate: introduced in section 6.4 by the extension operator 9c-
G ram m ars resulting from this operation as well as from the gram m ar cre-
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
105
ation operation are elements of set Cf for some version of the document
s tructu re j and docum ent directory i.
3. G ram m ar Identity: provided by the union operations used for the creation of
the dom ain directory G\.
4. Closure: all gram m ars introduced by the creation and the update operations are in Hj.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
106
Chapter 7
Examples
Among the various forms of representation available, the conventional notation is the
one which has been used by the m ajority of the activities which involve the comm unication of m athem atics. A m ajor lim itation on rendering m athem atical concepts
according to this notation is the syntactic overloading of the symbols used for the en
coding of the operators. This problem has been discussed in Section 5.2, and Figure
5.1 displays three common meanings th a t are usually attached to symbol v.
It is assumed, in this thesis, th a t people, most of the time, get exposed to m athem atics
by means of the encoding provided by the conventional notation. For this reason this
notation has been used in this work as the basic source of inform ation for the semantics
capturing process. A lthough sometimes the encoding provided bv the conventional
notation is not the ideal, it is im portan t to m aintain the syntactical arrangem ent this
encoding provides. This decision is fundam ental to the capturing strategy because the
choice of a notation which is widely used should free the au thor from the requirement
of learning the alternative syntax supported by the capturing system.
In this thesis a docum ent structure composed of a ttribu ted gram m ar fragm ents is
proposed to capture the m eaning of the m athem atical concepts. Context-dependent
representations are supported by a directory change mechanism where a set of gram
m ars is replaced by another to allow other interpretations to be associated with the symbols considered. The following sections illustrate the structure proposed by describing the process involved during the authoring of simple docum ents which only
contain m athem atical concepts represented by strings of characters.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1 0 7
7.1 Example 1: Overloading the + and * symbols
9\ ee ee EQ te # r = " # ( l , 3 )92 ee te93 it IN T E G E R # [1-9] [0-9]* #94 ep ep PLUS tp # " + ” # (1,3)95 ep tp96 st STRING # [0-9]+ #97 ec ec CAT tc # ”+-’ # ( 1 , 3 )9s ec tck new_expr ee
Table 7.1: Default gramm ars.
h te epk tp ith te ecu tc st
Table 7.2: G ram m ar fragments created by editing.
Consider the need to overload the + symbol in order to represent both the addition
of integers and the concatenation of strings of characters. The following docum ent
illustrates this by means of two identity expressions which use the same syntax for
their left side of the equality. This docum ent is called Prototype because it is the
au tho r’s first a ttem p t towards its creation.
Prototype< d x >
1 + 1 + 0 = 2
< d2 >
1 + 1 + 0 = 110
< / > < / >
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
108
The above prototype version of the docum ent is composed of two domains represented
by di and d2. As illustrated by the syntax of the docum ent, domain should contain
the necessary rules to recognize the left side of the first equality as the addition of
three numbers. In a similar way, dom ain d2 should contain rules to recognize the operations on the left side of the second equality as the concatenation of characters.
In order to provide the com plete structu re to support this docum ent, assume the
au thor initially has access only to the set of default gram m ars as defined in Table 7.1.
G ram m ar fragm ents g i, g 4 and <77 support expressions involving equality, addition
and concatenation operations respectively. G ram m ars g% and ge define the domains
over which the specification of addition and concatenation operations can respectively
apply. G ram m ar fragments g2 g$ and g% support the definitions of the equality, the
addition and the concatenation operations respectively. G ram m ar lo links gram m ar
gi to the control mechanism. G ram m ar fragments l\, /2, /3 and /4 have been created
by editing in order to provide the necessary links with the other fragments. The
result of the au th o r’s first a ttem p t to produce a structure to capture the m eaning of
the two mentioned m athem atical concepts is provided by Exl-V ersion 1. This code is presented as follows and it illustrates the two domains as well as the expressions
involving the overloaded operator.
Exl-V ersion 1< { l o ° { g \ 0 ^ 2 } ^ } ^ ;
{{ti 0 / 1 o { g i o g 5} t 3} t 0;
{ t 0 o l 2 o g 3} d l >
1 + 1 + 0 = 2
< {{<1 o h o { g 7 ° g s } t 4 } t 5 o l A o g 6} d 2 >
1 + 1 + 0 = 110
< / >< / >
As stated before, the main objective of this initial version of the docum ent is to represent both the addition and concatenation operations by the + symbol. For this purpose the au thor organizes the inform ation to be presented into two separate do
mains as a way of resolving the sem antical nondeterm inism generated bv overloading
the -I- symbol. The gram m ar fragments used for the definition of dom ain d\ have
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
109
been obtained from domain directory G? and the fragments used for the definition of
domain d2 were taken from domain directory G°. The complete definitions to support
this version of the docum ent are described next according to the docum ent structure
proposed in C hapter 6.
{gi o g 2}t2 ee ee EQ te # ”= ” # ( 1 , 3 )ee te
{/0 0 f 2 } t l new_expr eeee ee EQ te # ?r=TT # (1,3)ee te
{g4 0 g 5}t.3 ep ep PLUS tp # " + r # (1,3)ep tp
{ h o h o f 3 } * 0 new_expr eeee ee EQ te # ” = ” # ( 1 , 3 )ee tete epep ep PLUS tp # " + r # (U3)ep tp
{ t 0 o l 2 o g 3}di new_expr eeee ee EQ te # ” = ” # ( 1 , 3 )ee tete epep ep PLUS tp # " + " # ( 1 , 3 )ep tptp itit IN T E G E R # [1-9] [0-9]’ #
Table 7.3: G ram m ars in domain directory G? th a t have been created by gram m ar operations.
The current version of the docum ent is supported by the docum ent instance structure
S0 = (D 0, c ). The organization of the sem antic structure D 0 is defined in term s of its two dom ain directories G j and G° for this initial version of the docum ent as follows:
A ) = (G ?,G °) (7.1)
where G® is defined as
G° = U Fi° U G{* = {pi, p2, <73) <74) <75) /o, to, < i} (7-2)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
110
{97°98)tA ec ec CAT tc # r + r # ( 1 , 3 )ec tc
Oi ° h ° £4 ) ^ 5 new_expr eeee ee EQ te
co'
4bJl4b
ee tete ecec ec CAT tc # " + " # (1,3)ec tc
{ t b o U o g 6} d 2 new_expr eeee ee EQ te # " = ' ■ # (1,3)ee tete ecec ec CAT tc # " + " # (1,3)ec tctc stst STRING # [0-9]+ #
Table 7.4: G ram m ars in domain directory G 2 th a t have been created by gram m ar operations.
with
= 0i> ^}; F® = { g i , g 2,g3,g4,g5, lo}- C? = {to, t l , t-2 , t.3, rfi} (7.3)
and G 2 as
G 2 = N % U F 2° U C 2 = {ge , 9 7 ,98, h, h, 1 ,^ 4 } (7.4)
with
= {ge, 97,98, h,U}'- -P21 = { ^ 1 } - C 2 = { £ 4 , t 5, ^ 2 } . (7.5)
The to ta l set of gram m ars m anipulated by this initial prototype is given by
2
^ 0 = U G® = {g\ , 9i, 9z, g , gb, 98,9i, 9 8 A a , h , h , h , h , t \ , t 2 , h , U , h , d.\ d 2 } ■ (7.6)i = l
Now consider the need to modify the current version of the docum ent in order to include two other concepts: m ultiplication of integers and consecutive concatenation of strings of characters which is here called the power of strings. The power operation
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
I l l
is a binary infix operator which concatenates its left operand the num ber of times
stated by its right integer operand.
Syntactically, both operations are represented by the * symbol. This characteristic
indicates th a t two distinct directories will need to be provided in order to capture
the meanings assigned to the * symbol. The code under the label Exl-V ersion 2presents expressions involving these operations as well as both the addition and the
concatenation as introduced by Exl-V ersion 1.
Exl-V ersion 2< d2 >
1 + 1 + 0 = 110
< { ^ 2 % { h °{<?9 ° 5 l o } ^ 6 0 >
1 + 1 * 0 = 1
< { t 5 ° h ° k ° g n ° k ° g 3 ° g n } d i > a * 3 + b = aaab ;
1 * 3 = 1 + 1 + 1 = 3:
< / >1 * 3 = 1 + 1 + 1 = 3
< / > < / >
The code presented by Exl-V ersion2 above, makes use of three d istinct domains d2, d3 and d4. A lthough dom ain d 2 has been reused from the previous version of
the docum ent, domains d3 and d4 needed to be created. The com plete definition
of the structu re which supports this is provided by the docum ent instance structure
S i = (£>i,c). The sem antic structu re D\ is defined in term s of its three domain
directories G}, and G 3 for this version of the docum ent as follows:
D 1 = ( G 11, G 12, G 13, G 12) (7.7)
where is defined as
G} = N l U F} U C \ = { d 2} (7.8)
with
.% ' = {}; F ,1 = {<(,}; C} = {}, (7 .9 )
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
112
G\ as
G 2 = U F 2 U C 2 = {<?g, ^ 1 0 , hi 6) -6) d2, d3} (7-10)
with
= {gg, giQ,h,k}'- F 2 = {d 2}: C \ = { U , d z } , (7.11)
and G 3 as
^ 3 = -W3 u ^ 3 u ^ 3 = {#3, tfiii P12 , h, k , h, h i <M (7-12)
with
^ = {^ 1 1^ 12^ 7 , ^8,^9, }; Fg1 = {p 3, f5}; C 3 = { d 4} (7-13)
The gram m ars involved in this new version of the document structure are given by
3
H\ = | J G] = {g3i g^i g\0i g\\i g \ 2 i h i h i h i h i l 9 i h i h i d 2 , d ^ , d 4 } (7-14)1 = 1
99 tm tm MULT fm # v* " # ( l , 3 )gio tm fmh tp tmh fm it
Table 7.5: G ram m ars in dom ain directory G \ created bv editing.
Tables 7.5 and 7.6 provide gram m ars which belong to domain directory G \ . These
gram m ars have been introduced by editing and by gram m ar operations respectively.
Table 7.7 shows the gram m ars which belong to G 3 . They were introduced by editing.
Table 7.8 shows the gram m ars in G 3 which were introduced by editing.
7.2 Example 2: Symbols as operators and operands
This exam ple proposes a sem antic structu re to support the syntactical overloading
of the symbols + and *. Two different meanings are attached to each symbol and
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1 1 3
{<79 0 <7lo}^6 tmtrn
tm MULT fm fm
# v*v # ( 1 , 3 )
{ ^ 2 % { h ° t e 0 ^ } } ^ 3 new_expr eeee ee EQ te # : =" # (1,3)ee tete epep ep PLUS tp # :’+" # ( 1 , 3 )ep tptp ittp tmtm tm MULT fm # ;' * " # ( 1 , 3 )tm fmfmit
itIN TEG ER # [1-9][0-9]’ #
Table 7.6: G ram m ars in domain directory G \ th a t have been created bv gram m ar operations.
9n tp St PO W E R fp # # (1,3)9 n st ALPHANUM # [0 - la -s ]+ #h tc tph tc Sth fp it
Table 7.7: G ram m ar in dom ain directory G 3 created by editing.
each m eaning requires a customized dom ain where gram m ar fragments are needed to
support the sem antic capturing process.
Although the semantics usually attached to these symbols characterizes them as bi
nary operators, as provided by dom ain d3, many other meanings may also be associ
ated with them . One possibility, for example, is to have them as the elements of a
set. For this scenario, the two symbols will be the operands of the comma " ,v binary
operator which is used to organize the elements of a set in a list form at. This characteristic is illustrated by the single statem ent defined within the scope of dom ain d$
in the sem antic structu re th a t follows:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1 1 4
{ t 5 0 l7 0 / 8 0 <7U O / g 0 <73 0 < 7 1 2 ) ^ 4 new_expr eeee ee EQ te # " = r # (1,3)ee tete ecec ec CAT tc # " + ' ■ # ( 1 , 3 )ec tctc tptc sttp st PO W ER fp # " * " # ( 1 , 3 )fpit
itIN TEG ER # [1-9] [0-9]* #
st ALPHANUM # [0 - l a - 2 ]+ #
Table 7.8: G ram m ar in domain directory G 3 created by gram m ar operations.
Ex2:< d 3 >
0 + 1 * 1 = 1 < d 5 >
R = S = {+ ,*}
< / >< / >
^10 te idIn te bs9 l 3 id ID EN TIFIE R # [A-Z\ #514 bs SET el endset # ' {" # (2,3)515 endset EN D SET # T #516 el el LISTDEL tl # ,!= v # ( l , 3 )517 el tl518 tl BINARYOP # [+*] #
Table 7.9: G ram m ars in dom ain directory G!> created by editing.
Tables 7.9 and 7.10 illustrate all gram m ars required for this example. Since gram m ar
t i has already been defined in Section 7.1 it has not been included in these tables.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
115
{<714 0 <7l5} 6 bsendset
SET el endset ENDSET
#T # ( 2 ) #T #
{<7i6 0 9 n } t7 el el LISTDEL tl # :' = " # (1,3)el tl
{fi o / i o ° / n ° <?i3 ot.6 o t 7 o g \ s } d 3 new_expr eeee ee EQ te # " = r # (1,3)ee tete idte bsid ID EN TIFIER # [A-Z] #bs SET el endset # " { " # ( 2 )endset ENDSET # ''} '■ #el el LISTDEL tl # V # ( 1 , 3 )el tltl BINARYOP # [+*] #
Table 7.10: G ram m ars in domain directory G® created by gram m ar operations.
According to the m eta-gram m ar defined in Table 6.18 the a r g s - p o s i t i o n nonterm i
nal, included there, has the purpose of identifying the position of the argum ents of
the m athem atical concept represented by the associated rule. This nonterm inal is, in
the m eta-gram m ar, expanded as a list of integers.
G ram m ar fragm ent gu has integers 2 and 3 as its a ttribu tes. According to the m eta-
gram m ar, these two a ttribu tes determ ine the position of the nonterm inals which are
relevant to the definition of the concept presented by the only production rule th a t
gram m ar g u has. Argument 2 relates to the list which is defined by gram m ar g\§.
A rgument 3 indicates where the delim iter for the end of a set representation is placed.
The notion of sp litting pairs of symbols which together are part of the syntax of a
concept is used here as a way to ensure th a t production rules involving these concepts
are in the ENF. For this reason the symbol { from the pair {} has been used for the definition of a set in gram m ar g14.
The rest of this section describes the sem antic structure D 0, for this example, according to the model proposed in C hapter 6. The two domains d3 and d5 are defined as
elements of the domain directories G° and G° respectively as follows:
D o = (G ?,G °,G °) (7.15)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
116
Domain G ® is defined as
G “ = JV® U F? U C f = { d 3} (7.16)
with
N i = { h F i = { d s h C? = 0 (7.17)
and G° as
^ U F 2° U C j = {Zio, Zn,3i3, <7m, Pis, <7i6i 9 1 7 , Pis, ^6, *7, d s} (7-18)
with
^ 2 = { 10 J l l , Pl3, Pl4, Pl5, Pl6 , 017, Pl8 }: • 2 = {^l}' ^ 2 = { 6 , ^7, ^ 5 } (7.19)
The gram m ars required for this exam ple are provided by
2
Ho = ( J G,° = {p13, P1 4 , pis, pi6 , P1 7 , Pis, 7io, 7n, te, t 7, d5}. (7.20)j = i
7.3 Example 3: More meanings for the + symbol
The docum ent structures introduced by the previous examples illustrated a scenario
where the overloading of symbols took place in distinct expressions. This means
a given symbol appeared in more then a single expression w ith different meanings
associated with it. This problem was approached by a context switch where the
current dom ain was replaced by an adequate one th a t provided the necessary gram m ar support for the capturing of the m eaning of the concepts involved.
Symbol overloading may also take place w ithin the expression itself. For this scenario
the context switch would introduce as many distinct domains as the num ber of differ
ent meanings which are associated with any given symbol included in the expression.
This section presents a docum ent structu re to support expressions which require more then a single dom ain to capture the m eaning of the concepts they represent. To illustra te this problem consider the following expression which attaches two different
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1 1 7
meanings to the symbol + .
|A + B | + 1 = a (7.21)
The semantics of expression 7.21 determ ines th a t a is equal to the addition of integer
1 and the determ inant of the result of the addition of m atrix A with m atrix B . As it can be seen the symbol + is overloaded since it is used to represent both the addition
of m atrices and the addition of integers. The rest of this section provides a structure
to capture the semantics of this expression.
Ex3:
< {<7l9{°<720 0 9 2 } h } t 9 -
{^9 ° 921 0 <722 } # 6 >
IA + B | < {<723 ° ^ 2 0 <724 } # 7 >
+1 = a
< / >< / >
<7l9 new_expr D ET ee endet domain_scope # T # (2,3,4)*720 ee ee MATRIX_ADD et # " + " # ( 1 , 3 )<721 et MATRIX J D # [A-Z\ #<722 endet EN D ET # T #
Table 7.11: G ram m ars in dom ain directory G \ created by editing.
<723 new_expr PLUS ee # " + :' # ( 2 )
<724 te CONSTANT # (0| [1—9] [0-9]*)|[a-z] #
Table 7.12: G ram m ars in domain directory G \ created by editing.
The sem antic structure for this docum ent is defined as follows:
D 0 = (G?,G«) (7.22)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
118
{<?20 0 <?2}^8 ee ee MATRIX_ADD et # ” + ’■ # 0 , 3 )et MATRIX J D # [A -Z \ #
{<7l9 0 t s } h new_expr D ET ee endet domain_scope # T # (2 ,3 ,4 )ee ee M ATRIX.ADD et # " + " # ( 1 ,3 )et M A T R IX JD # [A -Z \ #
{£g 0 p21 0 5 2 2 ) ^ 6 new_expr D ET ee endet domain_scope # " | " # (2 ,3 ,4 )ee ee MATRIX_ADD et # r + " # ( 1 ,3 )et M A T R IX JD # [A -Z ] #
endet EN D ET # " | " #
Table 7.13: G ram m ars in domain directory G ° created by gram m ar operations.
{<723 0 f 2 0 <724 } < # new_expr PLUS ee # :' + " # ( 2 )ee ee EQ te # ” = v # 0 ,3 )ee tete CONSTANT # (0 |[l-9][0-9]*)|[<H #
Table 7.14: G ram m ars in dom ain directory G 2 created by gram m ar operations.
where G® is defined as
G? — N ° U F f U C f = {p2i <7i9, <?20,52i, 922 , h , h , #>} (7 .2 3 )
with
^ i° = {<?i9, <720, <721, # 2 2 }; F f = {^ 2}; G f = {<8, <9 , d6} (7 .2 4 )
and G 2 as
= A^ 1 U F 2° U C 2° = {<723, 924 , h , d j ) (7 .2 5 )
with
= { 923, 924, <#}; F 2° = { t 2}; C 2 = {d 7}. (7 .2 6 )
The to ta l gram m ars m anipulated by this initial prototype is given by
2
H q — | J G ° = {<?2, <?19, #20, <721, <722, <723, <724#2# 8 # 9 , ^6, d 7} (7 .2 7 )
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
119
Tables 7.11 and 7.13 are both associated with domain directory G?. Table 7.11
shows all gram m ars in this directory th a t were created by editing and Table 7.13 the
gram m ars th a t were generated as the result of com position operations. In a sim ilar
way the gram m ars in Tables 7.12 and 7.14 are associated with the domain directory
G°. The gram m ars in Table 7.12 are the result of editing and the gram m ars in 7.14
were created by composition.
As discussed in Section 7.2 the integer a ttribu tes which are introduced as part of
the rules of some gram m ars, have the purpose of determ ining the position of the
relevant nonterm inals of a rule. In Table 7.11 gram m ar fragment < 7 1 9 uses a ttribu tes
2 ,3 and 4 to refer to its three nonterm inals th a t are necessary in order to support
the correct expansion of this rule. N onterm inals ee and endet are associated with
a ttribu tes 2 and 3 respectively. Although both nonterm inals ee and endet belong to the same dom ain directory, nonterm inal dom ain scope , which is associated with
a ttr ib u te 4, does not. As part of the dynam ic control gram m ar, this nonterm inal is
associated w ith the context switch which is need to provide the adequate gram m ar
for the m athem atical concepts being processed.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
120
Chapter 8
The Processing Structure
In C hapter 6 the dynamics of authoring m athem atics was modeled by means of a
docum ent organization th a t uses CFGs as its fundam ental formalism. This chapter
presents a processing structure for the proposed organization.
8.1 Dynamic Authoring and Language Fragments
Throughout the previous chapters I have investigated problems related to modeling
the m athem atics' authoring behavior. One of my m ajor concerns when designing
a solution to this problem was th a t a t any instance during the authoring activity
the m athem atical concepts included in the docum ent had the ir semantics captured
regardless of the syntax used by the au thor for this purpose. This approach faces
the challenge of processing user-defined syntax1. This means a language processor to verify the syntactical validity of such a docum ent must be provided w ith the necessary
tools to support the processing of unpredicted language statem ents.
To recognize a given syntax such as a string of symbols say, for instance, w it requires a
CFG G such th a t w C L (G ). As already emphasized, allowing user-defined syntax for
expressing the semantics capture of m athem atical concepts introduces the possibility
of symbol overloading. In order to ensure th a t gram m ar G is not used to recognize
1A similar problem has been approached by [60] where a meta-language addition to the PASCAL programming language was proposed. The mechanism allowed the programmer to introduce his/her own syntax to the language.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
121
syntax definitions which contain overloaded symbols the sem antic characteristics of
concepts needs to be considered. The expression 1 + 0 = 1 , for instance, could be
used to illustrate both the boolean O R operation and the integer addition depending
on the context determ ined by the author. Consequently no single CFG should be
provided to capture both meanings.
One fundam ental idea I have applied to support the use of CFGs to approach the
semantics capturing problem is the fact th a t authoring m athem atics is an incremental
activity. Under this assum ption the final docum ent may be viewed as the result of a
set of docum ent modifications performed by the au thor or for short a set of authoring
increm ents. A nother way to express this is th a t the dynamics of authoring m athe
matics can be modeled as a set of sta tes where each sta te is uniquely characterized
by a CFG or scope. In other words a finite autom aton whose states are CFGs and
transitions are supplied by the author. One problem with this association is to de
term ine the boundaries of an authoring increment. This means when one ends and the next is to be considered.
To get around this nondeterm inism I have used the state change concept as a mech
anism to resolve ambiguities. Of course a s ta te change, in this context, must also be
triggered whenever the syntax used for a given concept cannot be recognized by the
gram m ars defined for th a t state. A uthoring therefore requires no scope change as
long as no syntactical am biguities are introduced and all syntax proposed are valid
statem ents for the current scope. The syntax attached to a concept will only be valid
w ithin a given scope and will be recognized as long as the scope it belongs to is active.
According to this strategy the docum ent a t the end of the authoring activity will be
organized as a sequence of sets of gram m ars. Since the docum ent has been created
by an increm ental approach it is intuitive to structu re its processing by means of a
mechanism th a t supports this characteristic. In essence new language processors will
need to be provided as new scopes are introduced. This means the dynam ic authoring
characteristic determ ines incremental changes to be made in the nota tion /language
used. Therefore increm ental changes also need to be provided to the gram m ars used
for the definition of the nota tion/language. This process may be viewed as a language prototyping activity where language fragm ents are included as a way to support new features.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
122
8.2 Processing Grammar Fragments
According to [104] a program m ing language processor is an application which m anip
ulates program s expressed in a given language. In this thesis, language processors or
processors will also be used to refer to these programs. Some well known program m ing
language processors are compilers and interpreters [104],
According to [6 ] the design of a compiler can be logically structured as the front end
and the back end. The parts associated with the source language are the lexical and
syntactic analysis, the symbol table creation, the sem antic analysis, the generation of interm ediate code and code optim ization. The front end is the collection of all these
parts. The back end portion is related to tasks th a t are associated with the target language. Therefore target code generation and target code optim ization are back
end tasks. The symbol table m anagem ent and error handling are tasks which are not
restricted to a single phase. These tasks may belong to both the front and back end
phases.
As described above the phase oriented decom position approach views a program m ing
language as a single indivisible object. An alternative way would be to describe a
language as a collection of fragments such th a t their combination would provide the
same processing power as the indivisible definition. The im portan t characteristic of
this approach is the fact th a t language fragments can be defined to represent not only
syntax but also the sem antic structu re of language constructs.
The following section presents the organization this thesis proposes to the construction
of docum ent processors to support the dynamics of authoring m athem atics. The
solution combines both notions of phase oriented processing and fragm ented language
definitions.
8.3 Dynamic Authoring and Document Processors
In Section 6.2.1 I have introduced a docum ent structure to model the dynam ics of
authoring m athem atics. The model described there organizes authoring as a sequence of sets of gram m ars. In this organization each set captures the syntax and portions
of the semantics of some m athem atical concepts th a t have been included in the docu
ment. A com plete sequence, in this case, characterizes one stage during the authoring activity. In o ther words it corresponds to a version of the docum ent.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
123
In order to process a given version of a docum ent, say for instance, version v, the
docum ent processor m ust step through the com plete sequence of sets of gram m ars
which is associated with v s ta rting from the sequence's first element. As a result
a context switching or scope change will take place whenever a set of gram m ars is
replaced by another. This procedure is the approach this thesis proposes to capture
semantics th a t is associated with the field of knowledge th a t m athem atical concepts
belong to. It is through this mechanism th a t the meaning of concepts which are
represented by overloaded syntactical constructs are captured.
As proposed in Section 6.2.1 expression Sj = (D 3,c ) with j > 0 describes the s ta te of
the docum ent a t a given instance during authoring. In this case D-j represents the sets
of gram m ars needed to support the creation of version j of the docum ent. Support
for the sequencing behavior is provided by the binding control gram m ar c. In this
section the following definition refers to the organizations defined by both D j and r
to present a possible arrangem ent of language processors to handle the dynam ics of
authoring m athem atics.
Definition 18 Assume M is the binding control gram m ar expressed in ENF. Con
sider a given version of a docum ent structure say, for instance, version j . Let
D j = (G j, G 32 , ■ ■ ■, G^ ) be the sem antic structu re associated with version j and
P M%Gi be the language processor for directory i such tha t
P m %gj- : object's syntax —» hierarchical representation
The language processor for docum ent structure S3 is defined by the determ inistic
finite autom aton
PDj = (Q j,E j ,£ j ,S j ,F j)
where
• Q j is the set whose elements are all processors associated with the directories
th a t compose the sem antic structu re D 0,
• sj = P'm %g{
• = FM%GJnj
• E j is the set containing elements which are the syntax of m athem atical objects
associated w ith version j of the docum ent, and
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1 2 4
• For all w G E j
' = P M%& if w e L ( G t ) ,M % C,\
£ Q ] - { P m % g >} otherwise
8.3.1 Example
C hapter 7 provides a set of examples to illustrate the organization this thesis proposes
to support the dynam ics of authoring m athem atics. A scenario where two versions of
a simple docum ent containing m athem atical expressions th a t overloads the + symbol
is provided in Section 7.1. Two docum ent instance structures So = (D 0,c) and
Si = (D \, c) have been created to support the m athem atical objects introduced during authoring.
The language processors associated with each version of this docum ent are therefore
P d0, for the first version, and P di for the second. The sem antic structu re for the
second version is
D, = (G|,G5,GS,GJ)
and the set of sta tes for its language processor is
Q i ~ { P m %g \ i P m %g \ i P m %g \->Pm %g \ }
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
125
Chapter 9
Concluding Remarks
This work introduced a user-oriented organization to support the creation of m ulti
purpose m athem atical docum ents. To approach this characteristic a mechanism to
capture the semantics of the m athem atical concepts was proposed. This mechanism models the dynam ics of authoring and allows m eaning-to-svntax bindings to take
place during the authoring activity. It also provides the au thor with the power to
select the syntax he/she believes is the most appropriate to express the ideas to be
com municated. A processing structu re to support the proposed organization was also presented.
9.1 Discussion
The organization introduced by the authoring model proposed in this work determ ines
th a t the semantics of a m athem atical concept is captured by the set of gram m ars th a t compose the directory which is associated with the concept. G ram m ars in this set
are structured according to the following characteristics: They either
1. have been created and are already available,
2. are the result of gram m ar operations, or
3. have been created by editing.
It is expected th a t the m ajority of the m athem atical concepts included in the docu
ment are supported by gram m ars which are already available. This means they are
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
126
part of a library and are ready to be used. In the event th a t new concepts need to
be introduced or their m eaning-to-syntax m appings need to be modified the model determ ines th a t the needed gram m ars are to be created by either editing or by the
application of operations on the existing gram m ars or a com bination of these two
approaches. E diting could be required only when few gram m ars are available or
whenever the concepts to be expressed require syntax th a t may not be supported by
operations on gram m ars th a t are already available.
Cognitive load is the degree to which cognitive resources are required for activities th a t
facilitate learning [99, 26]. According to [82] cognitive load increases with the am ount of inform ation to process. In [94], Salomon defines m ental effort as the num ber of
non-autom atic elaborations necessary so solve a problem. As noted by Clark in [31],
mental effort increases linearly and positively as the cognitive load increases. But how
can com puter-based systems be designed to reduce the cognitive load? As emphasized
by [82] inform ation overload can be reduced by modeling the user:
A user model can be described as a system knowledge source containing
assum ptions on aspects of the user th a t guide the behavior of the system.
The goal of building a user model is to reduce the user’s inform ation load.
This can be accomplished by adapting either the representation of the task or the task itself.
In the context of this work the task is authoring m athem atics and the representation
of the task is the approach taken for authoring. It is by reading and handw riting
th a t hum ans, m ost of the tim e, become exposed to m athem atics. Consequently the
mental model developed, during this activity, is the result of associations involving a
pen/paper-based form of representing the abstract m athem atical concepts. In other
words semantics of concepts are bound to syntax.
This thesis proposes a docum ent organization which:
1. models the dynam ics of authoring m athem atics and
2. allows the au thor the possibility of expressing m athem atical concepts bv means of syntax he/she feels com fortable with.
The cognitive load associated with authoring m athem atics by means of user-defined
syntax should therefore be reduced. By providing h is/her own m eaning-to-syntax
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1 2 7
bindings the au thor is free from details of notations which introduce other bindings
he/she is not com fortable with. Em pirical study results collected by [7] determ ined
th a t the am ount of errors produced by users when entering m athem atics on com put
ers increases when longer equations are considered. A lthough not reported in their
experim ent it can be hypothesized th a t the am ount of errors produced by users dur
ing entering expressions is also increased with the complexity of the notation used
due to the cognitive load increase. Consider, for instance, the representation of the
sum m ation in the O penM ath system found in Subsection 1.2.5. The syntax used for
this example is complex and therefore not appropriate for speech input. Furtherm ore,
due to its length, according to [7], this form of representation is prone to input errors.
The representation of this type of sum m ation is simplified when captured by means of
the approach proposed in this thesis. A sim ilar example may be found in Section 5.8 which requires only a single line of tex t to capture the sum m ation.
According to [7] the m ultim odal handwriting-plus-speech form of entering expressions
was faster and be tte r liked than the keyboard-and-m ouse m ethod. In this case allow
ing the au thor the possibility of m ultim odal input should be beneficial if the au thor
has the freedom to propose the m eaning-to-syntax binding. As noted in earlier in
this thesis, approaches such as M athM L and O penM ath have not been designed to
support m ultim odal forms of input. This lim itation and the other two lim itations
in Subsection 1.2.7, counter-intuitive entry order and complex syntax form at, are
overcome by the approach proposed in this thesis.
9.2 Authoring with Grammar Fragments
In this dissertation I have described the goal of capturing the m eaning of m athem atical concepts by means of a docum ent structu re which
1. allows the semantics of m athem atical concepts be encoded by user-defined syn
tax, provided the notation is context-free and
2. supports both extensibility and am biguity characteristics of the conventional
m athem atical notation.
In C hapter 1 I have made three claims concerning my approach to authoring documents containing m athem atics. These claims are repeated here followed by comments about the approach I took to accomplish each one of them.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
128
1. Both the meaning and syntax o f m athem atical concepts can be captured by a t
tributed context-free gramm ars. The solution I have proposed to capture the
semantics of m athem atical concepts is based on an organization th a t considers
the au tho r's needs as a fundam ental requirement. To support this character
istic I have modeled the dynam ics of authoring by means of a gram m ar-based docum ent structure, the DDM. A ttribu ted context-free gram m ars are used in
this structure. The a ttribu tes determ ine the following:
• the position of the operator's operand, and
• the necessary structure to identify the symbol arrangem ent to represent a
given m athem atical concept.
2. Extensibility can be supported by operations on the attributed gramm ars. Three
concepts related to extensibility were introduced in C hapter 6, the extension
norm al form, operation on gram m ars and fundam ental gramm ars.
• The extension norm al form was proposed in order to determ ine the gram
m ar form at to be used. The form at lim its the num ber of term inal symbols
in the g ram m ar’s rule. It also determ ines the possible term inal/nonterm inal
symbol arrangem ents each production rule must follow.
• Both the composition and the extension binary operations are defined for gram m ars in extension norm al form and both return gram m ars also in the extension norm al form. They allow the creation of gram m ars by combining
previously defined gram m ars. This approach introduced the possibility of
gram m ar reuse and incremental gram m ar definition.
• The notion of fundam ental gram m ars established the basic building blocks
to be used for the capturing activity. The three types of gram m ars defined
for this purpose provide the necessary means to support the creation of
any possible gram m ar. This statem ent is supported by the fact th a t each
one of these three gram m ars has only a single production rule which is of one of the types proposed by the extension normal form.
Since the composition and extension operators are defined for gram m ars in
the extension norm al form, the application of these operations on fundam enta l gram m ars will produce gram m ars which are also in the extension normal
form. This means gram m ars can be created during the authoring activity and
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
129
a program m able form of extensibility is therefore possible allowing user-defined
syntax to be introduced during the authoring of the docum ent. This mecha
nism assumes a set of default gram m ars is available. This has been described in
section 6.1 when a logical diagram introducing the activities involved during the
authoring of m athem atical concepts was presented. A detailed consideration of
this characteristic has been provided in section 6.2.
3. Am biguities generated by symbol overloading can be resolved by a scope mech
anism. As defined in C hapter 6 a docum ent instance structu re S is the tuple (D , c) where D is the sem antic structu re and c the binding control. The sem an
tic structu re D is a finite sequence of finite sets of gram m ars. These sets are
represented by a domain directory Gf where i determines the position the set
holds in the sequence and k refers to the version of the docum ent considered.
The binding control c is a CFG which defines the scope in which the rules
provided by each domain directory in the semantic s tructu re are valid. This means term inals defined in any given domain directory are local to the scope
determ ined by this structure.
9.3 Future Work
As stated in Subsection 1.2.8 com positionality of meaning is a design decision for
systems characterized by a sta tic syntax. In order to allow user-defined syntax to be
provided a t run-tim e additional complexity concerning the application of the com
positionality principle is introduced. This is because gram m ar rules will also need
to be supplied a t run-tim e. For this scenario com positionality is a system property.
Therefore any gram m ar th a t im plem ents the system must include support for compositionality.
The notion of fundam ental gram m ar introduced in C hapter 6 may be applied to sup
port the application of com positionality. Since these gram m ars are the basic building
com ponents, the representation of any concept will be subjected to the restrictions they introduce. Therefore compound concepts must be decomposed. The questions
to be asked a t this stage are:
1. Is the resulting decom position com positional?
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
130
2. Is there a way one can ensure com positionality of m eaning for such systems?
These questions I leave as open. A detailed investigation of the application of com
positionality is therefore a future goal. A part from the com positionality problem
the com plete im plem entation of the organization proposed in this dissertation is an im m ediate priority.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
131
References
[1] J. A bbott: O penM ath Design Com m ittee R eport. Technical report, O penM ath
Consortium , 1996. Available from h ttp ://w w w .o p e n m a th .o rg /.
[2] J. A bbott, A. Diaz, R. S. Sutor: A R eport on O penM ath, A Protocol for the
Exchange of M athem atical Inform ation. SIG SA M B ulletin 30(1) (March 1996), 21-24.
[3] J. A bbott, A. van Leeuwen, A. Strotm ann: Objectives of O pen
M ath. Technical report, O penM ath Consortium , 1996. Available from
h t t p : / /www. openm ath. o r g / .
[4] G. D. Abowd: Formal A sp ec ts o f H um an-C om puter Interaction. PhD thesis,
Oxford University, Oxford, England, 1991.
[5] S. R. Adams: M odular G ram m ars for Program m ing Language P roto typ in g .
PhD thesis, University of Southham pton, Southham pton, England, 1991.
[6] A. V. Aho, R. Sethi, J. D. Ullman: Com pilers: Principles, Techniques and
Tools. Addison-Wesley, 1986.
[7] L. Anthony, J. Yang, K. R. Koedinger: Evaluation of M ultim odal Input for
Entering M athm atical Equations on the Com puter. In CH I ’05: CHI ’05 E x
tended A bstracts on Human Factors in C om puting System s. 1184-1187. ACM Press, 2005.
[8] D. S. Arnon, S. A. M amrak: On the Logical S tructure of M athem atical N otation. T U G boat 12(4) (1991), 479-484.
[9] R. A rrabito: Using to Produce Braille M athem atical N otation. 1987.University of W estern O ntario, U ndergraduate Thesis.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
132
[10] R. G. A rrabito: Com puterized Braille Typesetting: Some Recom m endations
on M ark-Up and Braille S tandards. M aster’s thesis, The University of W estern
O ntario, London, Canada, 1990.
[11] R. G. A rrabito , H. Jiirgensen: Com puterized Braille Typesetting: another view
of M ark-Up standards. Electronic Publishing 1(2) (Septem ber 1988), 117 131.
[12] A. A sperti, G. Bancerek, A. Trybulec (editors): Third International Conference,
M K M 2004. Lecture N otes in C om puter Science 3119, Berlin, 2004. Springer-
Verlag.
[13] A. A sperti, B. Buchberger, J. H. Davenport (editors): Second International
Conference, M K M 2003. Lecture N otes in C om puter Science 2594, Berlin,
2003. Springer-Verlag.
[14] R. Ausbrooks, S. Buswell, D. Carlisle, S. Dalm as, S. D evitt, A. Diaz, M. Frou- m entin, R. H unter, P. Ion, M. Kohlhase, R. Miner, N. Poppelier, B. Smith,
N. Soiffer, R. Sutor, S. W att: M athem atical M arkup Language (M athM L)
version 2.0 (Second Edition). Technical report, W3C, 2003. Available from
http://www.w3.org/TR/2003/REC-MathML2-20031021/
[15] Y. Bellik, D. Burger: M ultim odal interfaces: new solutions to the problem of
com puter accessibilty for the blind. In Conference com panion on Human factors
in com puting system s. 267-268. ACM Press, 1994.
[16] C. Bigelow, D. Day: D igital Typography. Scientific A m erica 249(2) (1983),
106-119.
[17] P. V. Biron, A. M alhotra: XML Schema P a rt 2: D atatypes. Technical report,
OASIS, 2001. Available from http://www.w3.org/xmlschema-2/
[18] Instruction Manual for Braille Transcribing. American P rin ting House for the
Blind, Louisville, Kentucky, 3rd ed., 1984.
[19] The N em eth Braille C ode for M ath em atics and Science N otation , 1972 Revision.
American P rin ting House for the Blind, Louisville, Kentucky, 1985.
[20] T. Bray, J. Paoli, C. M. Sperberg-M cQueen, E. Maler, F. Yergeau, J. Cowan:
Extensible M arkup Language (XML) 1.1. Technical report, W3C, 2004. Avail
able from http://www.w3.org/TR/2004/REC-xmlll-20040204/
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
133
[21] M. Bryan: A T^X User's Guide to ISO ’s Document Style Semantics and Spec
ification Language (DSSSL). T U G boat 14 (1993), 223-226.
[22] H. Bunt: Issues in M ultim odal H um an-Com puter Comm unication. In H. Bunt,
R .-J.Beun, T. Borghuis (editors): M ultim odal H um an-C om puter Com m unica
tion: System s, Techniques, and Experim ents, 1374. Lecture N otes in C om puter
Science, 1-12, Springer-Verlag, Berlin, January 1998.
[23] S. Buswell, 0 . C apro tti, D. P. Carlisle, M. C. Dewar, M. Gae
tano, M. Kohlhase: The O penM ath S tandard (version 2.0).
Technical report, The O penM ath Society, 2004. Available from
http://www.openmath.org/cocoon/openmath/standard/om20/index.html
[24] S. Buswell, S. D evitt, A. Diaz, P. Ion, R. Miner, N. Poppelier, B. Smith,
N. SoifFer, R. Sutor, S. W att: M athem atical M arkup Language w3c, P ro
posed Recom m endation. Technical report, W 3C HTML, 1998. Available from
http://www.w3.org/TR/1998/REC-MathML-19980407/.
[25] 0 . C apro tti, D. P. Carlisle, A. M. Cohen: The O penM ath S tandard.
Technical report, The O penM ath Esprit Consortium , 2000. Available from
http://www.nag.co.uk/proj ects/OpenMath/omstd
[26] P. Chandler, J. Sweller. Cognitive load theory and the form at of instruction.
Cognition and Instruction 8(4) (1991), 293-332.
[27] J. Clark: The design of RELAX NG. Technical report, OASIS, 2001. Available
from http: //www. thaiopensource. com/relaxng/design. html
[28] J. C lark, M. Makoto: RELAX NG Specifica
tion. Technical report, OASIS, 2001. Available fromhttp://www.oasis-open.org/committees/relax-ng/spec.html
[29] J. Clark, M. Makoto: RELAX NG Tuto
rial. Technical report, OASIS, 2001. Available from
http://www.oasis-open.org/committees/relax-ng/tutorial.html
[30] R. E. C lark (editor): Learning From Media: Argum ents, A nalysis and Evidence.
P erspectives in Instructional Technology and D istance Learning. Inform ation Age Publishing, 2001.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1 3 4
[31] R. E. Clark: N ew Directions: C ognitive and M otivational Research Issues.
ch. 15. In Perspectives in Instructional Technology and D istance Learning [30], 2001 .
[32] E. F. Codd: A Relational Model of D ata for Large Shared D ata Banks. C om
m unications o f the A C M 13(6) (June 1970), 377-387.
[33] P. R. Cohen, D. R. McGee: Tangible M ultim odal Interfaces for Safety-Critical
Applications. Com m unications o f the A C M 47(1) (January 2004), 41-46.
[34] J. H. Coombs, A. H. Renear, S. J. DeRose: M arkup Systems and the Future of Scholarly Text Processing. Com m unications o f the A C M 30(11) (1987), 933-
947.
[35] J. Coutaz, L. Nigay, D. Salber: M ultim odality from the User and System Per
spectives. In Proc. ER C IM (European Research Consortium for Inform atics and
M athem atics), workhop on User Interface For All, Heraklion. 1995. Available
from citeseer.ist.psu.edu/coutaz95multimodality.html
[36] J. de Carvalho, H. Jiirgensen: Dynamic M ulti-Purpose M athem atics N otation.
Technical R eport 521, The University of W estern O ntario, 1998.
[37] M. Dewar: O penM ath: An Overview. SIG SA M B ulletin 34(2) (June 2000), 2-5.
[38] C. Dirckx: A M athem atical Text to Braille Translator. 1992. P roject Disser
ta tion , Churchill College, University of Bradford.
[39] A. Dix, J. Finlay, G. Abowd, R. Beale: H um an-C om puter Interaction. Prentice-
Hall, 1998.
[40] M. B. Dorf, E. R. Scharrv: Instruction Manual for Braille Transcribing. Division
for the Blind and Physically H andicapped, Library of Congress, W ashington, D. C., 1979.
[41] S. Dunne, H. Jiirgensen: Form atting Specialized Notations. In Proceedings
o f W O O D M A N ’89: W orkshop on O bject-O rien ted D ocum ent M anipulation.
Rennes, France, 1989.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1 3 5
[42] A. D. Edwards, R. D. Stevens: Une Interface M ultim odale pour l'Access aux
Formules M athem atiques par des Eleves ou E tudiants Aveugles. In Comme les
A utres: Interfaces M ultim odales pou r H andicapes Visuels, Special num ber 1.
97-104. INSERM, 1995.
[43] R. Elm asri, S. B. Navathe: Fundam entals o f D atabase System s. The Ben
jam in/C um m ings Publishing Company, Inc, Redwood City, California, second ed., 1994.
[44] M. G. Eram ian: Displaying DVI Files in Braille: A Viewer for the Visually
Im paired. Technical Report 500, The University of W estern Ontario, 1997.
[45] W . M. Farmer: MKM: A New Interdisciplinary Field of Research. SIG SA M
Bull. 38(2) (2004), 47-52.
[46] R. Furuta, V. Quint, J. Andre: Interactively Editing S tructured Documents.
Electronic Publish ing 1(1) (1988), 19-44.
[47] R. Furuta, J. Scofield, A. Shaw: Document Form atting Systems: Survey, Con
cepts and Issues. ACM C om puting Surveys 14(3) (1982), 417-472.
[48] C. Ghezzi, M. Jazayeri, D. M andrioli: Fundam entals o f Software Engineering.
Prentice-Hall, 1991.
[49] C. F. Goldfarb: A Generalized Approach to Document M arkup. SIG P L A N
N otices 16(6) (1981), 68-73.
[50] M. Goossens, J. Saarela: A P ractical Introduction to SGML. T U G bou t 16(2)
(1995), 103-145.
[51] M. Goossens, J. Saarela: From DT£X to HTML and back. T U G bou t 16(2) (1995), 174-214.
[52] D. Harel: S tatecharts: A Visual Formalism for Complex Systems. Science o f
C om puter Program m ing 8(3) (1987), 231 -274.
[53] D. Harel, A. N aam ad: The STATEM ATE semantics of statecharts. ACM Transactions o f Software Engineering and M ethodology 5(4) (1996), 293 -333.
[54] F. C. Heeman: G ranularity in S tructured Documents. E lectronic Publishing
5(3) (1992), 143-155.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
136
[55] J. E. Hopcroft, J. D. Ullman: Introduction to A u tom ata Theory , Languages.
and C om putation . Addison-Wesley, first ed., 1979.
[56] E. L. Hutchins, J. D. Hollan, D. A. Norman: D irect M anipulation Interfaces.
87-124. In Norm an and D raper [77], 1986.
[57] Inform ation Processing - Text and Office System s - Standard Generalized
M arkup Language (SGM L). In ternational O rganization for S tandardization,
International S tandard 8879, 1986.
[58] T. M. V. Janssen: Compositionality. In J. van Benthem, A. te r Meulen (editors):
H andbook o f Logic and Language. Elsevier Science Publishers, 1997.
[59] H. Jiirgensen: Tactile C om puter Graphics. 1997. M anuscript.
[60] H. Jiirgensen, H. W aldschmidt: Do Portability, Verifiability, and Simplicity
of Program m ing have to be Conflicting Goals? Technical R eport 123, The
University of W estern O ntario, 1984.
[61] B. W. K ernighan, D. M. Ritchie: The C Program m ing Language. Prentice-Hall,
Englewood Cliffs, New Jersey, 1978.
[62] P. Kilpelainen: SGML k XML content models. Technical R eport C-1998-12,
University of Helsinki, 1998.
[63] D. E. K nuth: The Genesis of A ttribu te G ram m ars. In Proceedings o f the
International Conference on A ttr ib u ted G ram m ars and their A pplications. 1-
12. Springer-Verlag New York, Inc, 1990.
[64] D. E. K nuth: The TfcXbook. Addison-Wesley, Reading, M assachusetts, 1993.
[65] L. Lam port: BTfcX, a D ocum ent Preparation System . Addison-Wesley, R eading, M assachusetts, 1986.
[66] J. R. Levine, T. Mason, D. Brown: lex & yacc. O ’Reilly k Associates, Inc,
Sebastopol, California, second ed., 1995.
[67] D. M. Levy: Fixed or Fluid? Document Stablility and New Media. E C H T
1994 Proceedings (Septem ber 1994), 24-31.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1 3 7
[68] X. Li: XML and the Com m unication of M athem atical O bjects. M aster's thesis.
The University of W estern O ntario, London, Canada, 1999.
[69] J. C. M artin: Introduction to Languages and The Theory o f C om putation .
McGraw-Hill, first ed., 1991.
[70] M ath T ype, M athem atical Equation E d itor , User Manual. Design Science, Inc., Long Beach, California, May 1997.
[71] H. A. M aurer, A. Salomaa, D. Wood: A supernorm al-form theorem for context-
free gram m ars. JA C M 30(1) (January 1983), 95-102.
[72] B. Meyer: O bjec t O riented Software Construction. Addison-Wesley, 1997.
[73] E. D. M ynatt, G. Weber: Nonvisual Presentation of G raphical User Interfaces:
C ontrasting Two Approaches. In CHI 1994 Conference Proceedings. 166-172, April 1994.
[74] W. M. Newman, M. G. Lamming: In teractive System Design. Addison-Wesley,
1995.
[75] L. Nigay, F. Jam bon, J. Coutaz: Formal Specification of M ultim odality. In
C H I’95 W orkshop on Formal Specifications o f User Interfaces. Denver, USA,
1995. Available from c i te s e e r . i s t .p s u .e d u /n ig a y 9 5 f o r m a l .h tm l
[76] D. A. Norman: C ognitive Engineering, 31-61. In Norman and D raper [77].
1986.
[77] D. A. N orm an, S. W. D raper (editors): User Centered System Design. Lawrence
Erlbaum Associates, Publishers, 1986.
[78] J. Paakki: A ttribu te G ram m ar Paradigm s - A High-Level M ethodology in
Language Im plem entation. A C M C om puting Surveys 27(2) (1995), 196-255.
[79] L. Padovani: On the Roles of D l^X and M athM L in Encoding and Processing
M athem atical Expressions. In A sperti et al. [13], 66-79.
[80] H. Petrie, W. Fisher, G. W eber, I. Langer, K. G. andC athy Rundle, L. Pyfers:
Universal Interfaces to M ultim edia. In 4th IEEE International Conference on
M u ltim odal Interfaces (ICM I 2002). IEEE Com puter Society, O ctober 2002.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
138
[81] N. A. F. M. Poppelier, E. van Herwijnen, C. A. Rowley: S tandard D TD 's and
Scientific Publishing. E PSIG N ew s 5 (Septem ber 1992), 10-19.
[82] L. M. Quiroga, M. E. Crosby, M. K. Iding: Reducing Cognitive Load. In H ICSS
’04: Proceedings o f the 37th Annual Hawaii International Conference on System
Sciences (H ICSS’04) - Track 5. 50131.1. IEEE Com puter Society, 2004.
[83] T. V. Ram an: T^X talk. T U G boat 12 (1991), 178.
[84] T. V. Ram an: An Audio View of D I^X Documents. T U G boat 13 (1992),
372-379.
[85] T. V. Ram an: Docum ents Are not ju s t for Printing. In Proc. Principles o f
D ocum ent Processing. 1992.
[86] T. V. Ram an: A udio System for Technical Readings. PhD thesis, Cornell
University, New York, USA, 1994.
[87] T. V. Ram an: An Audio View of DTfrjX Documents - P art II. T U G boat 16
(1995), 311-314.
[88] T. V. Ram an: Emacspeak: A Speech-Enabling Interface. Dr. D o b b ’s Journal
(Septem ber 1997).
[89] D. R. Raym ond, F. W. Tom pa, D. Wood: M arkup Reconsidered. In First
International W orkshop on Principles o f D ocum ent Processing. W ashington,
D.C., O ctober 21-23 1992.
[90] D. R. Raym ond, F. W. Tompa, D. Wood: From D ata Representation to D ata
Model: M eta-Sem antic Issues in the Evolution of SGML. C om puter S tandards
and Interfaces (1996).
[91] L. M. Reeves, J.-C . M artin, J. Lai, M. McTear, T. Ram an, K. M. Stanney, H. Su,
Q. Y. Wang, J. A. Larson, S. O viatt, T. Balaji, S. Buisine, P. Codings, P. Cohen,
B. K raal: Guidelines for M ultim odal User Interface Design. Com m unications
o f the A C M 47(1) (January 2004), 57-59.
[92] C. Roisin, I. Vatton: Merging Logical and Physical S tructures. Electronic
Publish ing 6(4) (1993), 327-337.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
13 9
[93] W. Rudin: R eal and C om plex Analysis. McGraw-Hill, New York, New York,
th ird ed., 1987.
[94] G. Salomon: Television is "easy" and print is v tough5’: The differential invest
ment of m ental effort in learning as a fucntion of perceptions and attributions.
Journal o f Educational P sychology 76(4) (1984), 233 -243.
[95] R. Sethi: Program m ing Languages C oncepts and Constructs. Addison-Wesley, 1990.
[96] G. G. Sm ith, D. Ferguson: D iagram s and M ath N otation in e-Learning: Grow
ing Pains of a New G eneration. International Journal o f M athem atical Educa
tion in Science and Technology 35(5) (2004), 681-695.
[97] C. M. Sperberg-M cQueen: Specifying Document Structure: Differences in
DT^X and T E I M arkup. T U G boat 12(3) (1991), 415-421.
[98] A. Strotm ann: C ontent M arkup Languages Design Principles. PhD thesis, The
Florida S tate University, Florida, USA, 2003.
[99] J. Sweller, P. Chandler: W hy some m aterial is difficul to learn. Cognition and
Instruction 12(3) (1994), 185-233.
[100] J.-P. Tremblay, P. G. Sorenson: The Theory and Practice o f C om piler Writing. McGraw-Hill, 1989.
[101] J. van Benthem, A. te r Meulen: H andbook o f Logic and Language. Elsevier Science Publishers, 1997.
[102] S. Vorkoetter: Proposed O penM ath Specification. Technical report, W aterloo
Maple Software, 1995. Available from http://www.openmath.org/.
[103] J. N. Wallace, T. A. B. Wesley: The Access to Scientific and M athem atical Inform ation for Blind People. [1991]. M anuscript, D epartm ent of Com puting, University of Bradford.
[104] D. A. W att, D. F. Brown: Program m ing Language Processors in Java. Prentice- Hall, Harlow Essex, first ed., 2000.
[105] G. Weber. A M ultim edia E dito r for M athem atical Documents. Available from http: //www .multireader. org/multimedia'/.20editor. html
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
140
[106] D. Wood: G ram m ar and L forms: an introduction. Lecture N otes in C om puter
Science 91. Springer-Verlag, 1980.
[107] D. Wood: Theory o f C om putation . John Wiley k Sons, first ed., 1987.
[108] F. J. W right: Interactive M athem atics via the Web using M athM L. SIG SA M
B ulletin 34(2) (June 2000), 49-57.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
141
Name:
Place of birth:
Education:
Awards:
R elated Work Experience:
VITA
Jackson W. Marques de Carvalho
Brazil
The University of W estern O ntario London, O ntario, C anada 1995-2005 Ph.D .
University of Maine a t OronoOrono, Maine, USA1983-1985 M aster of Electrical Eng.
Universidade Federal do Rio Grande do Norte N atal, RN, Brazil 1972-1978 B.Sc
Conselho Nacional de Desenvolvimento Cientifico e Tecnologico (CNPq)1995-1998
O rganization of American States 1983-1985
LecturerD epartm ent of C om puter Science University of P ittsburgh P ittsburgh , PA, USA 2002-present
LecturerSchool of Com puter Science University of W indsor W indsor, O ntario, C anada 1999-2002
G raduate Research A ssistant/L ecturer D epartm ent of Com puter Science The University of W estern O ntario London, O ntario, C anada 1999
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1 4 2
R elated WorkExperience:(cont) Teaching Assistant
Faculty of Inform ation and Media Studies The University of W estern O ntario London, O ntario, C anada 1998
LecturerD epartm ent of C om puter Science The University of W estern O ntario London, O ntario, C anada 1997
Teaching Assistant D epartm ent of C om puter Science The University of W estern O ntario London, O ntario, Canada1996-1998
C oordinator of the Scientific Com puting Center (NCC)D epartm ent of C om puter Science Universidade Federal do Rio G rande do Norte N atal, RN, Brazil 1991-1995
LecturerD epartm ent of C om puter Science Universidade Federal do Rio G rande do N orte N atal, RN, Brazil 1989-1995
G raduate A ssistantD epartm ent of Electrical EngineeringUniversity of Maine a t OronoOrono, Maine, USA1985
Electrical EngineerTechnological Nucleus a t Center of Technology Universidade Federal do Rio G rande do Norte N atal, RN, Brazil 1986-1989
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
143
Presentations:
Technical Reports:
1987 MTNS, Phoenix Az, USA Straight Line Motion,Inverse K inem atic Velocities and,Inverse Trajectory Planning
1987 MTNS, Phoenix Az, USAMASK Layout Language and Layout Checking Plots
D ynam ic M ulti-Purpose M athem atics N otation Technical R eport N um ber 521, 1998 In conjuction with Dr. Helmut Jiirgensen D epartm ent of C om puter Science The University of W estern O ntario London, O ntario, Canada
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.