mathematics as a game of types

Mathematics as a

Game of Types

(Thesis Format: Monograph)

bv

Jackson W. Marques de Carvalho

Graduate Program in

Computer Science

A thesis subm itted in partial fulfillment of the requirements for the degree of

D octor of Philosophy

Faculty of G raduate Studies The University of W estern O ntario

London, O ntario, C anada

© Jackson W. M arques de Carvalho 2005

R eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.

1*1 Library and Archives Canada

Published Heritage Branch

395 Wellington Street Ottawa ON K1A 0N4 Canada

Bibliotheque et Archives Canada

Direction du Patrimoine de I'edition

395, rue Wellington Ottawa ON K1A 0N4 Canada

Your file Votre reference ISBN: 0-494-12080-0 Our file Notre reference ISBN: 0-494-12080-0

NOTICE:The author has granted a nonexclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distribute and sell theses worldwide, for commercial or noncommercial purposes, in microform, paper, electronic and/or any other formats.

AVIS:L'auteur a accorde une licence non exclusive permettant a la Bibliotheque et Archives Canada de reproduire, publier, archiver, sauvegarder, conserver, transmettre au public par telecommunication ou par I'lnternet, preter, distribuer et vendre des theses partout dans le monde, a des fins commerciales ou autres, sur support microforme, papier, electronique et/ou autres formats.

The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

L'auteur conserve la propriete du droit d'auteur et des droits moraux qui protege cette these.Ni la these ni des extraits substantiels de celle-ci ne doivent etre imprimes ou autrement reproduits sans son autorisation.

In compliance with the Canadian Privacy Act some supporting forms may have been removed from this thesis.

While these forms may be included in the document page count, their removal does not represent any loss of content from the thesis.

Conformement a la loi canadienne sur la protection de la vie privee, quelques formulaires secondaires ont ete enleves de cette these.

Bien que ces formulaires aient inclus dans la pagination, il n'y aura aucun contenu manquant.

i * i

CanadaR eproduced with perm ission of the copyright owner. Further reproduction prohibited without perm ission.

THE UNIVERSITY OF WESTERN ONTARIO FACULTY OF GRADUATE STUDIES

CERTIFICATE OF EXAMINATION

Supervisor

Dr. Helmut Jurgensen

Supervisory Committee

Examiners

Dr. Stephen Watt

Dr. Kamran Sedig

Dr. David Spencer

Dr. Gerhard Weber

The thesis by

Jackson Carvalho

entitled:

Mathematics as a Game of Types

is accepted in partial fulfillment o f the requirements for the degree o f

Doctor o f Philosophy

Date April 8 , 2005___________________ Richard Kane______Chair o f the Thesis Examination Board

ii


Abstract

This thesis presents a gram m ar-based approach to the specification of m athem atical

notation . The method introduced is based on a m eta-structure th a t uses a ttr ib u ted

context-free grammars for capturing the meaning of m athem atical concepts. This

s tru c tu re supports the creation of m ulti-purpose docum ents and allows the specification of m athem atical notation in a dynam ical way. In the context of th is thesis,

m ulti-purpose documents refer to docum ents th a t may be rendered or used in differ

ent ways, some of which might not be known a t the tim e the docum ent is created.

By dynam ical it is understood th a t th e meaning associated w ith syntax is allowed to

be modified.

The proposal described in this thesis is based on an authoring model which addresses the user needs as a fundam ental requirement. This characteristic is structured around

a scope mechanism th a t allows the m apping between semantics and syntax to be modified a t any time during authoring. This process supports the dynam ic charac

teristics of the m eaning-to-syntax binding necessary during the authoring of m a th

em atical concepts. The m ulti-purpose property is supported by a sem antics-based

m ark-up th a t provides the possibility for the m athem atical concepts to be processed

according to the specific requirements of applications. M odular gram m ar fragm ents

characterized by a one-to-one m apping between m athem atical concept and gram m ar

representation provide the adequate support for the definition of the various scopes.

An increm ental update process is defined as a way to modify the necessary gram m ar fragm ents to support the changes proposed during the authoring process.

/keywords: m athem atics, types, user-oriented, interfaces, m etasystem , gram m ars, rendering, notation, authoring, m ultim odal

iii


Acknowledgments

I would like to thank my supervisor, Dr. Helmut Jiirgensen, who believed in me,

for proposing the problem, for his guidance and mentorship. I would also like to

thank M aia Hoeberechts for reading the previous version of this thesis and for her suggestions.

I am grateful to my parents, Jose and Janete, for making me understand the im por

tance of education and work. I wish to thank my children Carolina, Marcello e Luiza

for always rem inding me life can be fun even during difficult times. My special thanks

to my wife Rozane for her support, love and dedication to our children.

This work has been partially supported by the Conselho Nacional de Desenvolvimento

Cientffico e Tecnologico (CNPq), by the Universidade Federal do Rio G rande do Norte (UFRN), by Dr. Helm ut Jiirgensen.

iv


Table of Contents

Certificate of Exam ination ii

Abstract iii

Acknowledgements iv

1 Introduction 1

1.1 The Problem: Capturing Semantics by Means of User-Defined Syntax 3

1.2 Related W o r k .......................................................................................................... 4

1.2.1 D ata Model and D ata R ep resen ta tio n ............................................... 5

1.2.2 SGML and X M L ..................................................................................... 5

1.2.3 XML and RELAX N G ........................................................................... 7

1.2.4 A S T E R ....................................................................................................... 8

1.2.5 O p e n M a th ................................................................................................... 9

1.2.6 M a th M L ....................................................................................................... 13

1.2.7 Some Lim itations of Both O penM ath and M a th M L ................... 14

1.2.8 C o m p o s itio n a lity ..................................................................................... 14

1.3 M o tiv a tio n ................................................................................................................. 15

1.4 A Solution: Dynamical Document S t r u c t u r e ................................................ 17

1.5 Approach T a k e n ...................................................................................................... 18

1.6 Thesis O verview ...................................................................................................... 19

2 Basic N otions and N otation 21

2.1 Basic D e f in i t io n s ................................................................................................... 21

v


3 A Framework for Interactive System s 24

3.1 Basic N o t io n s ........................................................................................................ 24

3.1.1 Electronic and Paper D o c u m e n ts ...................................................... 24

3.1.2 Comm unication, Media and M o d a li t ie s ........................................... 26

3.2 User Interface Basic C o m p o n e n ts ................................................................. 27

3.3 An Existing M o d e l ............................................................................................. 28

3.3.1 A S tructuring P ro b le m ........................................................................... 29

3.4 A Different S tructure for Interactive S y s te m s ............................................. 30

3.4.1 A New F ra m e w o rk .................................................................................. 30

3.5 E x a m p le .................................................................................................................. 32

3.6 Sum m ary ............................................................................................................... 33

4 Authoring Environments 34

4.1 In tro d u c tio n ............................................................................................................ 34

4.2 Interaction Objects and A uthoring E n v iro n m e n ts .................................... 34

4.3 Cognitive D is ta n c e s ............................................................................................. 36

4.4 Rendering In fo rm a t io n ....................................................................................... 37

4.5 Encoding M athem atical C o n c e p ts .................................................................. 38

4.6 Environm ent M odifications................................................................................ 40

4.7 Changes in the I n te r f a c e ................................................................................... 41

4.8 R ecom m endations................................................................................................. 42

4.9 Sum m ary ............................................................................................................... 42

5 M athem atical Constructs and their Representation 44

5.1 N otational Systems as L a n g u a g e s .................................................................. 45

5.2 S tandard M athem atical N otation C harac te ris tics ...................................... 46

5.3 C apturing the Semantics of M athem atical C o n c e p ts ................................. 48

5.3.1 M athem atics and D ocum ent A u th o r in g ............................................ 49

vi


5.3.2 CFGs and D ata T y p e s .......................................................................... 50

5.3.3 CFG Lim itation to Support A uthoring M ath em atics ................... 51

5.3.4 U pdating C F G s ........................................................................................ 52

5.3.4 . 1 Identical Syntax and Rule S e m a n tic s .............................. 54

5.3.4.2 Redundancy, Syntax Equivalence and Normal Forms 56

5.4 Representing Polynomials ............................................................................... 59

5.5 Representing Subscripts and S u p e rsc rip ts ................................................... 61

5.5.1 Overloading S u b s c r ip ts ........................................................................... 63

5.5.2 Overloading Superscripted S y m b o ls .................................................. 64

5.6 Representing M a t r i c e s ...................................................................................... 64

5.7 Representing Sets of N u m b e r s ........................................................................ 6 6

5.8 Representing S u m s ............................................................................................. 67

5.9 C onclusion ............................................................................................................... 70

6 M odelling Context Dependent Information 71

6.1 A uthoring M athem atics and M ultim o d ality ................................................ 71

6.2 A Formal S tructure for Document A u th o rin g ............................................. 75

6.2.1 G ram m ars and Dynamic Document A u th o rin g ............................. 77

6.3 S tructuring with G r a m m a r s ............................................................................ 79

6.3.1 M athem atical Concepts and G ram m atical Dependencies . . . 82

6.4 G ram m ar O perations and E x te n s ib il ity ....................................................... 87

6.5 S tructuring with Domains and D ir e c to r ie s ................................................ 90

6.5.1 Domains, Directories and Symbol O v e rlo ad in g ............................. 92

6 . 6 Languages as Control S tru c tu re s ..................................................................... 94

6.6.1 D irectory Com position E x a m p le ......................................................... 97

6.6.2 The Control M e c h a n is m ....................................................................... 99

6.7 The Role of C o m p i le r s ....................................................................................... 100

6 . 8 M e ta -S tru c tu re ..................................................................................................... 102

6.9 C onclusion ............................................................................................................... 104

vii


7 Examples 106

7.1 Exam ple 1: Overloading the -I- and * s y m b o ls ........................................... 107

7.2 Exam ple 2: Symbols as operators and o p e r a n d s ........................................ 1 1 2

7.3 Exam ple 3: More meanings for the + sy m b o l.............................................. 116

8 The Processing Structure 120

8.1 Dynamic A uthoring and Language F r a g m e n ts ........................................... 120

8.2 Processing G ram m ar F ra g m e n ts ...................................................................... 122

8.3 Dynamic A uthoring and D ocum ent P ro c e s so rs ........................................... 122

8.3.1 E x a m p le ..................................................................................................... 124

9 Concluding Remarks 125

9.1 D isc u ss io n ................................................................................................................ 125

9.2 A uthoring with G ram m ar F r a g m e n ts ............................................................ 127

9.3 Future W o rk ............................................................................................................. 129

V ita 141

viii


List of Tables

4.1 P en /p ap er authoring environm ent.................................................................... 35

4.2 l^X -based authoring environm ent.................................................................... 36

4.3 Document authoring environm ent characteristics and software designapproaches to help achieving them ................................................................... 43

5.1 CFG rules for addition of integers 0 and 1..................................................... 52

5.2 G ram m ar for addition of integers 1 and 2........ .............................................. 54

5.3 G ram m ar for concatenation of characters a and b ..................................... 54

5.4 Derivation of word 1 + 2....................................................................................... 55

5.5 Derivation of word a + b........................................................................................ 55

5.6 G ram m ar for operations on integers and characters.................................... 55

5.7 Derivation of word a + 2....................................................................................... 56

5.8 CFG fragment for expressing words from G ........................................... 60

5.9 CFG fragment for expressing add ition , ellipsis and addition operations. 60

5.10 CFG fragment for expressing equality operation................................... 61

5.11 CFG fragment for subscripts and superscripts....................................... 62

5.12 CFG representation of the positive and negative parts of a function. . 64

5.13 CFG fragment for m atrices.................................................................................. 65

5.14 CFG fragm ent for intervals.................................................................................. 6 6

5.15 CFG fragment to capture the semantics of intervals.................................. 67

5.16 G ram m ar for sum m ation...................................................................................... 6 8

ix


5.17 G ram m ar for sum m ation.................................................................................... G9

6.1 Components involved in dynam ic authoring for m ultim odality............. 74

6 . 2 CFG for equality of strings of characters...................................................... 83

6.3 CFG for representation of schemes.................................................................. 84

6.4 G ram m ar fragments illustrating gram m ar dependencies.......................... 85

6.5 Basic gram m ar for add ition ............................................................................... 8 8

6 . 6 Operatorless gram m ar linking expr and term nonterm inals....... 89

6.7 Operatorless gram m ar linking term and num nonterm inals....... 89

6 . 8 Prim itive gram m ar setting nonterm inal num to term inal NUM BER . 90

6.9 Derived gram m ar for add ition ........................................................................... 90

6.10 Resulting gram m ar for expressions involving addition .............................. 91

6.11 Basic gram m ar for m ultiplication.................................................................... 91

6.12 O peratorless gram m ar linking term and factor nonterm inals..... 92

6.13 O peratorless gram m ar linking factor and num nonterm inals..... 92

6.14 Derived gram m ar for m ultiplication................................................................ 93

6.15 Resulting gram m ar for expressions involving addition and m ultiplication. 93

6.16 G ram m ar to support the use of both the composition and extension

operators.................................................................................................................... 94

6.17 CFG for the binding control m echanism ........................................................ 99

6.18 P roduction rules for the m eta-gram m ar........................................................ 103

6.19 A ttribu ted gram m ar to support the capturing of simple sum m ations. 104

7.1 Default gram m ars.................................................................................................. 107

7.2 G ram m ar fragments created by editing .......................................................... 107

7.3 G ram m ars in domain directory G® th a t have been created by gram m ar

operations.................................................................................................................. 109

7.4 G ram m ars in domain directory th a t have been created by gram m aroperations.................................................................................................................. 1 1 0

x


7.5 G ram m ars in domain directory G \ created by editing..........................1 1 2

7.6 G ram m ars in domain directory G \ th a t have been created by gram m ar

operations.................................................................................................................. 113

7.7 G ram m ar in domain directory G 3 created by editing....................... 113

7.8 G ram m ar in domain directory G 3 created by gram m ar operations. . . 114

7.9 G ram m ars in domain directory G° created by editing.............................114

7.10 G ram m ars in domain directory G® created by gram m ar operations. . 115

7.11 G ram m ars in domain directory Gj created by editing.............................117

7.12 G ram m ars in domain directory G° created by editing.............................117

7.13 G ram m ars in domain directory G? created by gram m ar operations. . 118

7.14 G ram m ars in domain directory G® created by gram m ar operations. . 118


List of Figures

3.1 Gregory Abowd's framework for interactive system s....................... 28

3.2 The Proposed Framework for Interactive Systems............................ 32

4.1 Framework for docum ent authoring environm ents............................ 40

5.1 M anv-to-many relationship between m athem atical concepts and their

representation................................................................................................. 47

6.1 S tructure to support dynam ic authoring and m ultim odality processing 73

6 . 2 A sketch of the dynamics of the au thoring/rendering process...... 74

xii


1

Chapter 1

Introduction

R ather than require th a t users change, system designers could adap t their

systems to key aspects of the users’ work practice [33] . . .

Reading and w riting m athem atics are activities th a t involve distinct characteristics of the notation used. Reading requires a stable m eaning-to-syntax m apping where

concepts may always be identified by an expected syntax. On the other hand, writing

m athem atics dem ands the possibility of the introduction of m eaning-to-syntax m ap

pings th a t, according to the au thor of the docum ent, best identify the inform ation

to be com municated. The fact th a t readers benefit from a standard notation and

writers require the flexibility to define new m eaning-to-syntax mappings is viewed, in this thesis, as characteristics th a t are in tension.

Approaching the specification of the m athem atical notation for electronic docum ents

by providing a standard will, of course, benefit readers. This also implies th a t users of com puterized systems th a t support the standard will be forced to adap t to the

details provided by the specific notation in order to m anipulate the concepts there

represented. One may argue th a t learning any notation provided by a system may

not be a m ajor concern since adequate hum an-com puter interfaces may be provided

to support this activity. This is true for the case when the underlying m athem atical

notation is stable and fixed. It means the relation between syntax and semantics does not change and new concepts are not allowed to be added to the set covered by

the notation. It is undeniable th a t notations th a t are both stable and fixed could be

enforced for users of com puter algebra systems, for instance. It is also intuitive to see th a t the addition of adequate G raphical User Interfaces (GUIs) would help minimize


2

the effort required to use any system th a t initially supports only text-based interfaces

for the m anipulation of m athem atical notation. An example of this is M athType [70]

th a t uses a GUI as a form of helping the user to produce the correct T£X syntax.

As new concepts are introduced, encodings are needed to support their m anipula

tion. Consequently the m athem atical notation evolves by extending the relationship

between concept and syntactical representation. From an au th o r’s point of view

the relationship between m athem atical concepts and their representation may be ex

pressed in two possible ways: authors may choose to use an already existing syntax,

or they may provide a new syntactical encoding for the concept.

Regardless of using new or already available notation and using a GUI or any other

type of user interface, com puterized systems to support m athem atical notation need

to be based on an authoring model. The set of constraints and facilities the au thor

will experience during the complete process of generating m athem atical notation for

electronic docum ents are the fundam ental characteristics of these models.

A lthough it is reasonable to enforce a specific m athem atical notation for readers

it does not make sense to restrict the authoring process to any standard notation

vocabulary and w riting style. This does not indicate th a t a standard notation is

not necessary. It ju s t supports the intuitive notion th a t authors should have the

freedom to modify the set of m appings between symbols and m eaning provided by a

standard . The modifications required during the authoring activity may either result

from the au th o r’s need to com m unicate concepts not supported by the standard or by

a necessity to redefine some elements of the set of mappings. A nother characteristic

of this process is th a t authors do not usually supply their notational conventions at

one specific part of the docum ent. They, instead, introduce notation wherever they

feel it is necessary.

In essence a standard notation for the representation of m athem atical concepts is

therefore necessary for the com munication of inform ation among com puter systems.

Examples of such notations are the ones proposed by O penM ath [23] and M athM L [14],

However, such standards are not desirable for supporting the flexibility required by

the authoring process during the creation of docum ents containing m athem atics. This

is because user requirements regarding the notation are determ ined during the au

thoring activity. For this scenario a dynam ic notation is needed.

In order to be capable of handling unforeseen m eaning-to-syntax relationships, a no tational system m ust be organized around the possibility of describing the construction


3

of the rules instead of providing the rules themselves. This allows authors to create

the notation th a t, according to them , best fits the purpose of the docum ent. This pro

cess characterizes a meta-system , and instances of it will consequently be notational systems.

Central to the design of any com puter-based application are the user's characteristics and the contexts in which the application will be used. The need for m ultiple modes

of com munication and m ultim edia has been acknowledged bv [73, 80, 15], and many

others, as a promising approach to improve the com puter access by visually im paired

users. In particu lar the development of m ultim edia documents supported by user

interfaces which can be configured to adap t to users with print disabilities have been

addressed by [80]. The im portance of m ultim odalities and m ultim edia to support the

com puter-based communication of m athem atics has been emphasized by [42],

This research was originally m otivated to make docum ents accessible to blind people.

Fundam ental requirements associated with these users’ lim itations had therefore to

be considered. These concerns included the followed two possibilities1:

1 . to allow input and ou tpu t to be performed through the various senses of the

hum an perceptual system and,

2 . to optim ize the use of each m odality in order to adapt to the users' cognitive

abilities2.

The above mentioned characteristics required the docum ent representation to be in

dependent of the m odality /m edia used for communication.

1.1 The Problem: Capturing Semantics by Means of User-Defined Syntax

I am concerned with the design of com puter-based interactive systems for processing

both the capturing and rendering of m athem atical concepts. In this thesis I focus on

'These requirements as well as other characteristics related to the design of multimodal user interfaces are presented in [91].

2The communication of digital logic diagrams to visually impaired users, for instance, may he improved when a tactile display is used in combination to speech [59].


4

the capturing part of the problem. In order to approach this I consider the following issues:

1. The notation used for the encoding of the m athem atical concepts is not fixed.

It may be modified a t the docum ent au thor's discretion. This means the au thor is free to attach any syntax to any given concept.

2. The m eaning of m athem atical concepts can be captured bv means of a text-

based docum ent structure.

3. The structu re of any docum ent involving only m athem atics is the only provider of m eaning to the concepts there included.

4. The user interface used for com puter-assisted docum ent authoring is indepen

dent of the structure of the docum ent. It is viewed as a component th a t com

municates with the docum ent structure.

1.2 Related Work

A discussion of some related efforts which have trea ted the problem of the represen

ta tion of the semantics of m athem atical concepts is presented in this section. Due to

the im portance of processing electronic docum ents th a t contain m athem atics a new

interdisciplinary field, M athem atical Knowledge M anagement (MKM), has emerged

[13, 12, 45]. This field deals with the intersection between m athem atics and com

puter science and aims to develop be tte r ways to articulate, organize, dissem inate

and provide access to m athem atical knowledge. ASTER [8 6 ], O penM ath [23] and

M athM L [14] are im portant research projects in this field. P rior to the discussion of

the three approaches mentioned, a brief introduction to the notions of d a ta model and d a ta representation have been included. The reason for this is because I believe

they are fundam ental concepts for the definition of document specification structures.

An introduction to the strategy proposed by SGML [57] to structure docum ents is

also discussed. The end of this section addresses the principle of com positionality of meaning [98, 58, 101].


5

1.2.1 Data Model and Data Representation

According to [43] the concept of d a ta model in a database relates to the idea of hiding

d a ta storage details by means of d a ta abstractions. The structure provided by the

d a ta abstractions usually includes support for d a ta type definitions, da ta relationships

and constraints which the da ta should satisfy. A part from providing a d a ta structure for representing inform ation a d a ta model includes operations on the da ta structure

[90]. These operations are the means by which d a ta are accessed, retrieved and updated.

In addition to a set of operators, an efficient im plem entation of the da ta update

concept requires both the identification and control of redundant data. It also involves

the notions of equivalence, functional dependencies and normal forms. An example

of a d a ta model which addresses these issues is the relational d a ta model [32],

A da ta model is basically a d a ta encoding and a set of operators which m anipulate the data , whereas a da ta representation does not include the operators. A discussion

involving the differences between d a ta model and d a ta representation is provided by

[90]. The im portance of the notion of update in d a ta models may be expressed by the

relations between the notions of update and equivalence. As emphasized bv [90] an

efficient use of update should involve some mechanism to control redundancy which

requires the notion of equivalence.

1.2.2 SGML and XML

The S tandard Generalized M arkup Language (SGML) [57] is a docum ent represen

ta tion language which standardizes the application of generic coding and generalized

m arkup concepts. One of its im portan t characteristics is th a t it allows docum ents

to be trea ted in a way sim ilar to databases [90, 89]. As a m eta-language, SGML

defines a standard process for the specification of the syntax of descriptive m arkup

languages. This characteristic is based on the notion of docum ent representation schemes or Docum ent Type Definitions (DTDs) in SGML words.

It is by means of DTDs th a t SGML provides the necessary constructs to support the

representation of the logical structu re of docum ents. Three fundam ental concepts are

involved in this activity: entities, elements and attributes.


6

As stated in the International S tandard ISO 8879 [57] an SGML entity is defined as a

collection of characters th a t can be referenced as a unit. An entity has no structural

properties. Its application is restricted to the replacement of a string of characters

by an identifier.

S tructured docum ents are composed of a collection of components. These components

are characterized by their context, scope and type. The relationship a component

has with other com ponents is its context. The boundaries determ ining the beginning

and end of a com ponent define its scope. D ocum ent components may contain other

components or ju s t data. Consequently the type of a given component will either

be determ ined bv the da ta or by the com position of the types of the com ponents

which contribute to its definition. In SGML these components are represented by

elements. An SGML element may contain a ttribu tes. The purpose of the a ttribu tes is to describe some properties of the element.

SGML provides no operations for updating DTDs. It relies on editing for accomplish

ing any possible modification on any of its derived languages. Therefore it represents

descriptions of sta tic data. This characteristic is considered a lim itation when ap

plied to the representation of dynam ic d a ta sets. A lthough entities and the a ttrib u te

pair ID /ID R E F may be used as a way of elim inating redundant data , they cannot

be applied to control it since both are controlled by the au thor of the docum ent [90].

Also, as pointed out by [90] there is no system support to indicate w hether the use

of ID /ID R E F a ttribu tes refer to redundant information.

According to [6 8 ] the Extensible M arkup Language (XML) [20] is a simplified subset

of SGML th a t has capabilities for supporting its use over the Internet. Related to this

fact is a relevant distinction between XML and SGML. As pointed out by [6 8 , 62],

XML does not require a DTD to be delivered with its associated docum ent. Instead

it requires docum ents to be well-formed. This characteristic relates to the proper nesting of the s ta rt and end tags used for m arkup.

Validity constraints on the content of the instances not expressible through the XML's

DTDs are not effectively verified [17]. This is because XML's leaf nodes' structure

is usually either plain tex t or empty. This means rigorous type checking is not supported. Checking w hether the inform ation provided is either a date, a telephone

num ber or a ZIP code, for instance, is not supported.


1.2.3 XML and RELAX NG

XML Schema provides an alternative to DTD. It allows much more rigorous control

and supports d a ta types. In this thesis the RELAX NG [27, 28] schema language is

considered because it has been adopted by O penM ath [23] as the m ajor formalism for its encoding.

According to [90], RELAX NG is a d a ta model since it includes both support for d a ta

encoding and operations on the data. Most operations proposed by RELAX NG are

based on the operations used by DTDs to express d a ta constraints. Some of these

are, for example, choice, optional and zeroOrMore which correspond to |, ? and * D TD 's operators respectively.

Among the d a ta operations RELAX NG proposes, the replace definition mechanism

is not supported by XML DTDs. Its im plem entation involves the ref, include and

define operations. No specific operator is provided for this operation. Its semantics is

provided by an example [29]. The semantics of this operation is sim ilar to the context-

free gram m ar extension operation [36] I have proposed in 1998. The following example illustrates this operation:

<grammar><start><element name="addressBook">

<zero0rMore><element name="card"><ref name="cardContent"/>

</element></zero0rMore>

</element></start>

<define name="cardContent"><element name="name">

<text/></element><element name="email">

<text/>


8

</element></define>

</grammar>

Assuming the above syntax is available as the file addressBook.rng a define element,

containing the syntax to be replaced, is placed inside an include element. The syntax

th a t follows replaces the contents of the card element.

<grammar>cinclude href="addressBook.rng">

<define name="cardContent"><element name="name"><text/>

</element><element name="emailAddress"><text/>

</element></define>

</include></grammar>

As a result the previous gram m ar defined in the file addressBook.rng has the contents

of its card element replaced by the inform ation provided through the include element.

1.2.4 ASTER

Audio System For Readings (ASTER) [8 6 ] is an audio previewer for electronic doc

uments w ritten in the family of m arkup languages. A STER's processing en

vironm ent m aps the logical structu re of the T^X-based docum ent into its internal representation, a tree d a ta structure. Therefore browsing through a m athem atical

expression corresponds to visiting nodes of the tree. A representation of the docu

ment in audio form at is obtained by the application of a set of com mands w ritten in a language called AFL, which stands for Audio Form atting Language. One facility

this language provides is the possibility of variable substitu tion. This means an AFL rule may replace a portion of an expression by a label. This allows the user to obtain an overview of the expression prior to getting exposed to all its details.


9

1.2.5 OpenMath

Intended to become a m ajor standard to support the exchange of m athem atical infor

m ation, O penM ath concentrates on the dissem ination of scientific knowledge through

electronic means and on the d istribu ted processing of m athem atical inform ation [23].

By specifying the sem antic contents of m athem atical data, O penM ath aims a t the

inter-operability provision between the diverse systems capable of processing m athe

m atical inform ation [23].

The m ain focus of O penM ath is on the unambiguous communication of m athem ati

cal concepts [108]. This characteristic is achieved bv representing the m athem atical

concepts as O penM ath objects. These objects have the property of incorporating

both the semantics and structural inform ation of a m athem atical concept. A ttributes

may be attached to O penM ath objects and they can be applied to provide additional

inform ation not related to the semantics of the object such as typesetting details or

the URI of a given CD, for example.

O penM ath objects are structured as basic, compound and derived. Informally an

O penM ath object is viewed as a tree [23]. Basic objects are the leaf nodes of the tree. The non-leaf nodes of the tree are made up of its compound objects. This

choice of organization determ ines the LISP style O penM ath uses for the encoding

of its com pound objects. This means O penM ath builds expressions by using prefix

operators. O penM ath basic objects are integers, symbols, variables, floating-point

numbers, character strings, and bytearrays. Derived objects are non-O penM ath ob

jects th a t are im ported by means of the a ttribu tion construct. Com pound objects are created by the application, binding, a ttribu tion and error constructs.

The fact th a t O penM ath aims a t the com munication of m athem atics am ong com

puting systems is expressed by the way its objects are encoded. A binary and an

XML form of encoding are defined for its objects. A lthough the standard states tha t

the XML encoding is readable and w ritable by humans, [37, 108] claim the encod

ings provided are neither m eant to be read by humans nor to be created by editing

procedures where humans directly supply all the necessary syntax. Among the two

standard encodings available, the XML encoding is used to define the m eaning of the objects to be transm itted .

A pplication and binding are O penM ath constructors. An application constructs an

O penM ath object from a sequence of one or more O penM ath objects. The following


10

XML encoding illustrates the use of the application object to capture the semantics

of the variable x , + 1 [23].

<0MV name= "x"><0MA><0MS csbase="http://www.openmath.org/cd"

cd="arithl" name="plus"/><0MV name="i"/><0MI>1</0MI>

</0MA></0MV>

A binding is composed of three objects, a binder which is the first, followed by an

optional set of argum ents which are variables to be bound followed by a body. The

following example is taken from the a r ith l CD [23] which captures the m eaning of

the m athem atical expression Y}x=\^/x by means of the binding object.

<0M0BJ><0MA>

<0MS><0MA><0MS cd="interfall" name="integer_interval"/><0MI> 1 </0MI><0MI> 10 </0MI>

</0MA><0MBIND>

<0MS cd="fnsl" name="lambda"/><0MBVAR><0MV name="x"/>

</0MBVAR><0MA><0MS cd="arithl" name="divide"/><0MI> 1 </0MI><0MV name="x"/>

</0MA></0MBIND>


http://www.openmath.org/cd

11

</OMA></OMOBJ>

An a ttribu tion decorates an object with a sequence of one or more pairs composed

of an O penM ath symbol, the a ttrib u te , and an associated object, the value of the

a ttribu te . According to [23] a ttribu tion may either be used as an adornm ent or as

sem antical annotations depending on the role associated with the a ttribu te . The

standard states th a t when the a ttr ib u te has role sem antic-attribution the a ttribu ted

object is modified by the attribu tion . For this reason a ttribu tion is also considered a

constructor. A lthough this characteristic is referred to as an im portan t feature, the

a ttribu tion examples included in the standard only involve adornm ent annotations.

The following code illustrates both the use of the attribu tion object by associating non-O penM ath da ta with an O penM ath object by the use of the foreign element.

<0MATTR><0MATP><0MS cd="presentation" name="mathml"/><0MF0REIGN>

<math xmlns="http://www.w3.org/1998/Math/MathML"><mi> sin </mi><mfenced><mi> x </mi></mfenced>

</math></0MF0REIGN>

</0MATP><0MA>

<0MS cdbase="http://www.openmath.org/cd" cd="transcl" name="sin"/>

<0MV name="x"/></0MA>

</0MATTR>

The error object is not considered because it has no direct m athem atical meaning. Its use is to report problems related to the communication of O penM ath objects.

The O penM ath structure used for grouping O penM ath objects is a Content Dic

tionary or CD for short. The definition of a CD usually includes other CDs. An exception to this is the M ETA-CD which contains the definition of the structu re of


http://www.w3.org/1998/Math/MathML

http://www.openmath.org/cd

12

a CD. CDs may be grouped as a mechanism to define collections or groups and both CD and CD groups are XML docum ents.

The d a ta provided by a CD may be structured according to the type of inform ation

th a t is addressed. Inform ation included in a CD either

• belongs to the whole CD 0 1

• is about the m athem atical concepts there represented.

Represented bv the element OMS , an O penM ath symbol is the mechanism the s tan

dard uses to refer to symbols from a Content Dictionary. It is by means of its three

attribu tes, cd, name and cdbase th a t the element OM S determ ines where the sem an

tics of a nam e is defined. A restriction regarding the location a t which a symbol may

appear in an O penM ath object is provided by a characteristic called the role of the symbol.

Inform ation related to the definition of an O penM ath symbol is organized as m anda

tory and optional data. The nam e and the description of the symbol are m andatory.

O ptional inform ation includes examples, formal m athem atical properties (FM P),

commented m athem atical properties (CPM ) and the role.

The optional characteristic of FM Ps indicate th a t there exists no consistent way of

expressing the semantics of m athem atical concepts. The definition of the sum object

as provided in the a r ith l CD is presented by means of a tex t description followed by

an example. Even when formal properties are provided it is difficult to determ ine the

set of properties th a t best characterize a concept.

A lthough the role is one of the fields of inform ation th a t defines an O penM ath Symbol

its definition is provided as a CD element. It is not clear from the description provided

by the standard the reason why a symbol characteristic is defined in a CD.

O penM ath extensibility is based on the notion of CDs. This means for each m ath em atical concept not supported by the standard , a CD must be provided with the

definition of the concept structured according to the O penM ath objects. A lthough

the latest version of the standard [23] relies on RELAX NG's mechanisms to support the au tom atic generation of CDs, the definition of the O penM ath objects included in

the CD depend on the same editing tools used for the m anipulation of text . For this

reason CDs are sta tic descriptions of data. O penM ath resolves ambiguous definitions by means of the cdbase a ttrib u te of OMS.


13

1.2.6 MathML

The M athem atical M arkup Language or M athM L [14] is a World W ide Consortium

(W3C) recom m endation for describing m athem atical notation. M athM L is an XML

application which focuses on the provision of m athem atics on the World W ide Web.

M athM L approaches the m arkup of m athem atical concepts by means of two sets of elements and a ttribu tes. It is bv means of this property th a t M athM L encodes the

layout as well as the semantics of m athem atical expressions. P resentation M athM L

and Content M athM L are two languages provided to support this characteristic.

In much the same way T£X approaches the typesetting of m athem atical text, pre

sentation M athM L determ ines the control over the display of m athem atics. Content M athM L is m eant to supply more m eaning to the description of m athem atical con

cepts. One restriction this form of m arkup provides is the lim ited range of m athe

m atical concepts it covers. This is because content M athM L has been designed to

support the encoding of m athem atical concepts th a t are used from kindergarten to

the end of high school and the first two years of college. Like O penM ath, M athM L

also shares the characteristic of being a system -oriented approach. This property has

been emphasized by [79]:

while M athM L is hum an-readable, it is anticipated th a t, in all but the

simples cases, authors will use equation editors, conversion programs, and

other specialized software tools to generate MathML.

Content M athM L consists of about 120 elements accepting about a dozen a ttribu tes.

The representation of concepts not covered by these elements may be obtained by

referring to external definitions. The M athM L csymbol element or content symbol

is provided to address this lim itation. This element is the constructor M athM L

has to refer to a symbol the m eaning of which is not provided by M athM L's core

content elements. It is by its two a ttribu tes definitionURL and encoding tha t

csymbol determ ines the characteristics of the external element. The def initionURL a ttrib u te specifies the Uniform Resource Identifier (URI) th a t provides the definition for the new symbol. The encoding a ttrib u te determines the syntax of the target

th a t has been referred to by the def initionURL attribu te . The content of a csymbol is either PCDATA or a presentation construct. The following example illustrates the

characteristics of this form of extension:


14

Ccsymbol d e f in i t io n U R L = " www. e x a m p le . c o m /C o n tD if fF u n c s . htm "

e n c o n d in g = " t e x t ">

<msup>

<mi> C </m i>

<mn> 2 </mn>

</msup>

< /csy m b o l>

The above definition encodes a symbol th a t semantically represents the space of

twice-differentiable continuous functions and has its syntax encoded as C 2.

1.2.7 Some Limitations of Both OpenMath and MathML

1. M athem atical expressions in both O penM ath and M athM L are built by using

prefix operators. Therefore the order of entry is counter-intuitive [96] since the

mental model imposed by both approaches determ ine th a t user inputs notation

from the inner most nested expression outward, instead of from left to right.

2. Although both standards support m ultim odality of output, they have not been

designed to support m ultim odalitv of input. This is because their structure involves complex syntax.

3. Both standards are system -oriented. Consequently their constructs are not easily readable and w ritable by humans.

1.2.8 Compositionality

Regardless of the notation used to express the m eaning of a m athem atical concept one

property which needs to be considered is the sem antic structure of the concept. Sem antic structu re denotes the parts which comprise the concept, their ordering, group

ing and relations am ong these parts. One challenge introduced by this characteristic

is to ensure the correctness of a chosen sem antic structure for the representation of a m athem atical concept.

The principle of compositionality of meaning has been proposed bv [98] as a require

ment to be considered to the design of knowledge representation languages. This

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

http://www.example.com/ContDiffFuncs.htm

15

concept is covered in detail in a chapter titled Compositionality [58] in the Handbook

of Logic and Language [101]. The key idea of the com positionality principle is tha t the m eaning of a sentence can be composed from the meaning of its parts. In a more

precise form this principle is sta ted as

The m eaning of a compound expression is a function of the m eaning of

its parts and the syntactic rule bv which they are combined [58].

A language is considered com positional if it satisfies the com positionality principle.

This involves the decision on w hat are the basic semantic and syntactical compo

nents and how they are combined [58]. Therefore a design th a t is not com positional

indicates th a t its parts a n d /o r the syntactic rules which bind them have not been

selected properly. A lthough achieving com positionality of m eaning might seem to be an impossible task, [58] claims th a t

. . . com positionality becomes possible if sem antic considerations influence

the design of the syntactic rules.

The above indicates th a t one can always find a syntax th a t allows the assignment of

the intended m eaning in a com positional form. This property is supported by Theo

rem 9.4 in [58] which claims th a t any possible m eaning can be assigned to any possible

language in a compositional form. For languages characterized by a fixed (static) syn

tax com positionality of m eaning is a design decision since it can be achieved by the

choice of a suitable gram m ar. Theorem 9.3 in [58] supports this characteristic. It

proves th a t if a language can be generated by any algorithm it is possible to gen

erate this language by a com positional gram m ar. According to [98] O penM ath is compositional and M athM L is not.

1.3 M otivation

The work of this thesis was originally m otivated by the necessity of having a T^X- to-Braille translation system [10, 11]. As characterized in [10, 11, 44], both T^X and standard Braille representations emphasize the syntactical structure of the concepts

involved. For this reason a sem antics-preserving translation from T^X input to Braille


16

was not achieved. T he recom m endations provided by [10, 1 1 ] regarding this transla

tion included the necessity of a semantics-based m arkup. This has, of course, been

noted by many others in the field [8 6 , 14, 23].

The autom atic translation of T^X input into standard Braille ou tpu t was approachedby A rrabito [10]. The impossibility of this translation was reported as a consequence

of the sem antic am biguities of some frequently used m athem atical constructs, in the

T rjX definition, and the lack of m eta-rules in the Braille standard to cope with the

macro expansion characteristic of T^X.

The experience reported by A STER and bv A rrabito’s experim ent provided some

valuable insight into the rendering of m athem atics. Since the two approaches were

based on input provided from T^X files, they both had to deal with all the conse

quences a tex t form atter could impose when used as a source for representation of

m athem atical semantics. A STER assumes all its source input are well w ritten 3 lAI^X

docum ents. This implies th a t any macro definition, including the ones provided by

the author, must reflect the logical structu re of the concepts involved in the definition.

A nother in terpretation of this requirem ent is th a t a restriction is necessary in order

to lim it the excess of power provided by T^X to the user.

The fact th a t lAI^X is characterized by a procedural m arkup 4 approach, obtained

by means of macro calls, can be viewed as both an advantage and also a constraint.

Macro definitions provide the ability to support the natural instability of the conven

tional m athem atical notation. On the other hand, the same macro definitions pose a m ajor difficulty to the processing environm ent with respect to their use. If expanded,

the sem antic contents they provide are lost. If not defined properly, they may not

carry the needed semantics.

As tex t form atters, systems based on I^jX were designed around the necessity of

having a structured d a ta representation. The main m otivation for this approach is

th a t a standardization of representation paves the grounds for its interchange. By-

preserving the way inform ation is represented, the possibility of having to re-process

d a ta whenever a new system was introduced or as the result of an upgrade in the

current system is no longer a concern.

3 ASTER's structure is based on the assumption that, distinct mathematical concepts that share the same syntactic encoding must be described by distinct macro definitions.

4 Procedural markup consists of commands that determine how text should be formatted [34],


17

A lthough docum ent structures based on standardization of representation favors doc

ument portability, they are not adequate for rendering docum ents in ways th a t require different hum an senses for the understanding of inform ation. This has been observed

by both Ram an [8 6 ] and A rrabito [1 0 ] while working on m apping RTgX into speech

and T^X into Braille respectively. The necessity of having a docum ent structure tha t would allow the m athem atical concepts be com municated regardless of media used or

the hum an senses involved, m otivated the research reported in this thesis. The section

th a t follows outlines a semantics-based solution to the specification of m athem atical concepts.

1.4 A Solution: Dynamical Document Structure

I propose th a t the meaning of m athem atical concepts can be captured in a user-

oriented 5 way by means of an appropriate gram m ar formalism which satisfies the following criteria. The gram m ar formalism must

1 . model the dynam ics of authoring m athem atics,

2 . describe the structure of the rules bv which syntax is created,

3. provide operations on the rules th a t define syntax and

4. support the definition of syntax by the application of the operations on these

rules.

In my thesis I introduce a text-based docum ent structure (Document Description

Model) which satisfies the above four criteria, and is therefore capable of capturing

the semantics of m athem atical concepts in a user-oriented way. The proposed model has the following characteristics:

1 . it supports both the extensibility and am biguity characteristics of the conven

tional m athem atical notation and

2 . it allows the au thor of a docum ent the possibility of introducing h is/her own syntax for the encoding of the m athem atical concepts.

5 In the context of this work user-oriented refers to a design approach focused on the needs of the end user.


18

I claim the following:

• The m eaning of m athem atical concepts can be captured by a ttribu ted context-

free gram m ars.

• Extensibility can be supported by operations on the a ttribu ted gramm ars.

• Ambiguity generated by symbol overloading can be resolved by a scope mech

anism where the m eaning of concepts is uniquely defined.

1.5 Approach Taken

This thesis presents a gram m ar-based approach to the specification of m athem atical

notation. The m ethod introduced is based on a m eta-structure th a t uses a ttribu ted

context-free gram m ars for capturing the m eaning of m athem atical concepts. This

structu re supports the creation of m ulti-purpose docum ents and allows the specifica

tion of m athem atical notation in a dynam ical way [36]. In the context of this thesis,

the term m ulti-purpose docum ents refer to docum ents th a t may be rendered or used

in different ways, some of which might not be known at the tim e the docum ent is

created. By dynam ical it is understood th a t the meaning associated with syntax is

allowed to be modified.

The proposal described in this thesis is based on an authoring model which addresses

the user needs as a fundam ental requirement. This characteristic is structured around a scope mechanism th a t allows the m apping between m eaning and syntax to be

modified a t any tim e during the creation of the docum ent. This process supports

the dynam ic characteristics of the m eaning-to-syntax binding necessary during the

authoring of m athem atical concepts. The m ulti-purpose property is supported by a

semantics-based capturing mechanism [49, 11, 21] th a t provides the possibility for

the represented concepts to be processed according to the specific requirem ents of

applications. M odular gram m ar fragments characterized by a one-to-one mapping

between m athem atical concept and gram m ar representation provide the adequate support for the definition of the various scopes. An incremental update process is defined as a way to modify the necessary gram m ar fragments to support the changes

proposed during the authoring process.


19

1.6 Thesis Overview

In order to provide document authors with the freedom of com m unicating m athem at

ical concepts bv means of the syntax th a t, according to the authors, best represent

the concepts involved, adequate docum ent structures need to be available. This thesis

addresses this problem by introducing a system atic approach th a t allows an au thor

to capture the m eaning of each m athem atical concept according to the syntax he/she

feels best describes it.

The approach presented here is based on a m eta-structure which has been designed

with the support of a ttribu ted context-free gram m ars. C hapter 2 introduces the

reader to the notation and the fundam ental definitions. Some of the definitions pro

vided may be found in books covering the theory of com putation, however they have

been included to make the thesis notationally self-contained.

A framework for interactive systems is proposed in C hapter 3. The proposed frame

work is based on the model developed by Abowd [4] and introduces an additional

translation in order to support the consultation of the system 's s ta te by the user.

The framework is refined in C hapter 4 by the decomposition of its core component

into two subcom ponents, the O perating System and the Document Structure. This organization is also used in th a t chapter to support the claim th a t docum ent au thor

ing is an interactive activity th a t requires an environment for its fulfillment. Defined

as a pair (Document S tructure, User Interface) the notion of A uthoring Environment

separates user interface com ponents from the structure of the docum ent. C hapter

4 also provides the basic concepts needed for the definition of requirements for the

evaluation of authoring environments. For this purpose a set of properties is provided.

In C hapter 5 the possibility of capturing the m eaning of m athem atical concepts by means of context-free gram m ar fragments is introduced. This possibility illustrates th a t although these gram m ars can be used for the capturing activity, they do not

provide the necessary support for both extensibility and am biguity characteristics of

the conventional m athem atical notation. The m ajor lim itation with this approach is

because context-free gram m ars only support sta tic descriptions of semantics. This

restriction is addressed in C hapter 6 where the dynamics of docum ent authoring is

considered. The approach developed in th a t chapter proposes the docum ent structure component for com puter-based authoring environments. This structure is composed of two components: a sequence of sets of gram m ars called Semantic S tructure and a


•20

gram m ar called the Binding Control mechanism. The semantic structure is based on

a ttribu ted context-free gram m ars and it addresses extensibility bv combining gram

m ar definitions. Two gram m ar operations are defined for this purpose. These opera

tions assume the gram m ars involved have been defined according to the restrictions

specified by a normal form proposed in the chapter. The am biguity characteristic is approached by a context switch which allows the replacement of a sem antic structure

by another. C hapter 7 provides a set of examples. These examples are used to il

lu stra te the characteristics of the approach introduced in C hapter 6 . A structu re for

processing the docum ent organization presented in C hapter 6 is proposed in C hap

te r 8 . The language processing model introduced is defined as a determ inistic finite

autom aton th a t has its states characterized as sets of gram m ars and its transitions

by the m eaning-to-syntax bindings established during authoring. C hapter 9 contains a discussion of the approach proposed by this thesis, conclusions and suggestions for future work.


21

Chapter 2

Basic Notions and N otation

This chapter presents the notation to be used throughout this thesis and includes

the necessary basic definitions. The specification of gram m ars may be approached

by listing their production rules whenever a complete specification is not necessary.

All gram m ars in this thesis will be displayed in table form. The gram m ar's name

will always appear in the far left column and each row of the table will contain a

production rule w ritten with spaces as symbol delimiters. Both nonterm inal and

term inal symbols are represented by strings of characters, possibly linked by the underscore character. Lower case strings of letters are used to represent nonterm inals

and upper case letters and other characters are used to represent term inal sym bols1.

The symbol | is sometimes applied to group together rules associated with the same

nonterm inal. The nonterm inal on the left of the production rule in the first row is

the s ta rt symbol. The arrow —► is replaced by a colon in all gram m ars except the one

for the m eta-structure. For a ttribu ted gram m ars, one additional column is included

a t the right edge of the table to represent the a ttribu tes associated with the rules. Strings of a rb itra ry characters are used to represent attributes.

2.1 Basic Definitions

The main definitions are here included in order to establish the notation th a t is used

throughout the thesis. For further inform ation see [55] as a standard reference.

1 The choice of representation for both terminals and nonterminals is consistent with the approach used by compiler tools such as lex and yacc.


22

An alphabet is a finite non-empty set. Elements of an alphabet are called symbols.

Let A' be an alphabet. Then A'* is the set of all words over A' including the empty

word e.

Definition 1 A context-free gramm ar (CFG) is denoted G = (AT, T, P, S ) where N is an alphabet of nonterm inal symbols, T is an alphabet of term inal symbols such tha t

N (IT = 0, P is a finite set of (production) rules of the form A —> w with A E N and

w E (N U T)*, and S E N is the s ta rt symbol.

Let G = (N , T, P, S ) be a context-free gram m ar, let V = N U T, and let u, v G V*.

The word v is derived from u in one step, if there is a rule A —► w G P and there are

words U\,U2 G V* such th a t u — u \A u 2 and v = U\WU2 - The fact th a t v is derived

from u in one step is denoted by u => v. We write u =>* v to denote the fact th a t there is a non-negative integer n and there are words u0, Ui , . . . , un G V'* such th a t u = u0,

v = un, and Uj_i => u* f°r i = L • • • , n - ' n this case we say th a t v is derived from u,

the integer n is the number of derivation steps, and the sequence uq, «i, . . . , un G V'*

is a derivation of v from u. The set

L(G ) = {u | u G T* and S =>* u}

is the language generated by G.

D e fin it io n 2 Let G = ( N ,T , P, S) be a context-free gram m ar. For all rules A —>

w E P , A E N , w E ( N U T)*, A is called the left (hand) side, or lhs, of the rule,

and w is the right (hand) side, or rhs of the rule. For p = A —> w, lhsp = A and

rhsp = w. The set of nonterm inal symbols of p is Np = Lp U Rp where Lp = {lhsp}

and Rp = { M \ M E N and rhsp = W \M w 2 ,Wi and w 2 E V*}. The set of term inal

symbols of p is Op — {x | x E T and rhsp = W \xw2, w i and w 2 G V*}.

D e fin it io n 3 An attributed grammar is a sextuple G = (N, T, P, S, A , a ) w ith the following properties:

• The quadruple G = { N, T , P, S) is a context-free gram m ar, the underlying gram

mar.

• A is a language over some finite a lphabet, the attribute language.


23

• a is a m apping of P into A , the attribute assignment.

Any word in A is called an attribute. For a rule p € P , the word a(p) is the attribute

of p.

D e fin it io n 4 A determ inistic finite autom aton (DFA), M, is a quintuple, (Q , E, 6, s, F),

whose

• Q is an alphabet of s ta te symbols,

• E is an alphabet of input symbols,

• s € Q, where s is the s ta rt state,

• F C Q , where F is the set of accepting states, and

• S : Q x E —> Q is the transition function


24

Chapter 3

A Framework for Interactive System s

In this chapter a framework for interactive systems is proposed. The framework

introduced here is based on the model defined by Abowd [4]. It differs from his

approach by the introduction of an additional translation which connects the user

and the ou tpu t component of the system 's interface.

3.1 Basic Notions

Com puter-based systems have been designed to support a wide variety of human

activities. Hum an com munication is one field wrhich has been expanding through

support from com puter technology. In this section some aspects of hum an-com puter

com m unication are discussed.

3.1.1 Electronic and Paper Documents

It seems the s ta tic world of paper docum ents is gradually being replaced bv the

dynam ic environm ent of digital inform ation. In the electronic form, docum ents need to be structured in order to be processed by com puting systems.

A key element of electronic docum ent processing is the possibility of easy m anipulation of a docum ent's atom ic elements by means of digital devices. This idea intro

duced the necessity to view docum ents not only as printed output generated by a


25

digital machine, but also introduced the need to store docum ents in a way to provide

full portab ility to other com puter environm ents easily. This means the structu re of docum ents needed to be preserved.

This way of viewing docum ents suggests they are composed of a logical structure, a

set of abstrac t com ponents, and contents where the actual contents of the docum ents can be found. The logical structu ring of docum ents is based on the decomposition

of docum ents into parts. Each part in the structure has a particu lar m eaning and

may, recursively, be subdivided into other parts. In this way the whole docum ent

can be represented as a collection of hierarchically-related com ponents. An abstract

com ponent, a given paragraph of a docum ent, for example, may be expressed over

one or more two-dimensional page space, in various different ways, depending on

specifications of font, hyphenation, line length and other concrete variables. The

same logical component may be m apped into different concrete variables and then

made available in different m edia by means of a tactile display, a Braille prin ter

or audio, for instance. In this thesis the process of translating abstract docum ent

com ponents into concrete ones is defined as rendering. The production of hardcopy,

images, speech or any other possible presentation structures from concrete document

components to ou tpu t devices are defined as viewing.

According to Levy [67] docum ents have been created in response to a hum an necessity

to provide stabilities in a constantly changing world. The notion of fixing the form of

a docum ent as a means of fixing its contents is viewed as a property docum ents have which he defines as invariance.

It is intuitive to relate this notion of invariance to paper docum ents since they are the result of a process by which surfaces of paper sheets are usually marked in a stable

way. On the o ther hand electronic docum ents usually require rendering in order to

be m anipulated by humans. The fact th a t one given abstract docum ent component

may be m apped into different concrete ones indicates the existence of a one-to-manv relationship between them. This relationship is an im portant property of electronic

docum ents because it allows various m edia to be used to deliver the inform ation

provided by the abstract docum ent com ponent. The idea of using different media to com m unicate is discussed in the subsection which follows.


26

3.1.2 Communication, Media and Modalities

Inform ation is shared among humans bv a communication process. This process may

always be described in term s of three fundam ental components: a sender, a receiver

and a com m unication channel or medium. Information carriers such as com puter

input and ou tpu t devices and the physical carriers such as sound waves and photon

distributions are media. Therefore medium is the physical channel used for inform a

tion encoding. Sensory m odality is a hum an mechanism of perception where vision,

hearing, touch, smell, taste, and balance are used for the processing of incoming

inform ation. Representation m odality is the way inform ation is encoded in some medium.

Com m unication through a given set of m odalities is only possible when provided

by adequate inform ation carriers. The following scenario illustrates this relation:

Consider, for instance, the directions given by one person to another to find a place

in a city. The necessary directions may, for example, be given by voice in combination

with gestures. In this case the sensory m odalities used are hearing and vision. The

sound waves and photon distributions are inform ation carriers. Both the spoken

language and the set of gestures are representation modalities.

Sensory modalities are physical characteristics of the human body, therefore their

num ber is fixed. On the other hand the num ber of inform ation carriers varies. In

the scenario illustrated by the above example the inform ation carriers were chosen

to characterize a face-to-face or hum an-to-hum an communication activity. In this

thesis, this form of inform ation exchange is characterized by the absence of com puter- based systems and by the fact th a t both sender and receiver are hum ans sharing

place and time. Humans also exchange inform ation with the aid of com puter-based systems. This form of inform ation exchange is referred to here as com puter-assisted

com munication. The concept of com puter-assisted communication is, in this thesis,

used in a broad sense. Its meaning includes the notion of both hum an-com puter

interaction and com puter-m ediated hum an-to-hum an interaction. Also in the context of this work, interaction is used to refer to the communication between user and system.

Humans usually make use of available media to com m unicate ideas and feelings. A lthough the increase of inform ation carriers does not necessarily improve the com

munication it is, most of the time, expected th a t the inform ation to be shared is


2 7

available to the receivers through all possible modalities. According to Bunt [22],

people com m unicate with each other, most of the time, according to what he calls

the Multimax Principle. He defines this principle as follows:

In natura l com munication, the participants use all the modalities and

m edia th a t are available in the com municative situation.

The m ultim ax characteristic is present even in situations where one of the parties

involved by the com m unication is not capable of com m unicating in all the m odalities

by which inform ation is made available. As an example of this consider, for instance,

the face-to-face communication between sighted and blind people. If it is assumed the

sighted person com municates with the blind using voice and gestures, for example, it

is clear th a t the inform ation provided by a set of gestures will not be processed bv

the blind. A lthough it is known th a t the exchange of inform ation with blind people

is not improved when gestures are used, sighted people do not usually avoid this

representation m odality when com m unicating w ith blind people.

It is intuitive to think about com puter-assisted communication in term s of face-to-

face com m unication having all available media and modalities as characterized by the

m ultim ax property. One challenge to this approach is the definition of adequate struc

tures for both software and hardw are to support this characteristic. The rem ainder

of this chapter discusses some aspects of the software needed in term s of a framework

to support the interaction between the user and the computer.

3.2 User Interface Basic Components

User interface design for com puter applications is an interactive process where sets

of objects are m anipulated. These objects can be structured according to the role

they play in the interaction. They can be of input, output or both input and output

types. They may also be of direct use in case the physical object is m anipulated, or

they can be of indirect access if no physical interaction is perm itted.

The com ponent which connects input and ou tpu t objects is generally referred to as a system. Therefore the user accesses the system by m anipulating the interface

objects. Systems differ by their intrinsic characteristics. These qualities are viewed

as statem ents of a language which can be used to represent the system. This will be


28

referred to as the core language. Users can be described in term s of psychological and

physical characteristics relevant to the com m unication with the system. Users have goals which may be realized by the system. These goals are structured as activities

which the user may realize by com m unicating with the system. These properties may

also be expressed as language statem ents which we call the task language.

The system ’s s ta te is reported in forms defined bv the output objects. The a ttribu tes which establish the way the sta te of the system is rendered characterize the language

used by ou tpu t objects to com municate. In a sim ilar way, user requests are sent to the

system by configuring input a ttribu tes according to the required behavior defined by

the task to be performed. The a ttribu tes involved in these type of requests represent

the features of the language the user has to use to interact with the system.

SYSTEMcore

OUTPUT

task

Figure 3.1: Gregory Abowd’s framework for interactive systems.

3.3 An Existing Model

The interaction framework proposed by Abowd [4] describes the com munication between user and com puter by a model composed of four com ponents and four translations. The com ponents represent the stages the interaction goes through. Each component has its own language by which its internal characteristics are defined. The translations are used to m ap knowledge between the com ponents. Figure 3.1


29

illustrates this framework. In this figure com ponents are represented as nodes and

translations are the arrows linking the nodes. Component names are typeset in upper

case letters and both the names for the languages and translations are in lower case.

The languages are task , input, core and output.

As shown in Figure 3.1 articulation connects the USER to INPUT. Therefore it is used to represent the user’s intentions in term s of the structure provided for d a ta entry

by the system. Performance is responsible for the translation of inform ation collected

during the input stage into core data . The s ta te of the system is m ade available to

ou tpu t devices by presentation. Observation is the user's ability to perceive the sta te of the system.

3.3.1 A Structuring Problem

It is intuitive to decompose the interaction between user and com puter in term s of

execution and evaluation semicycles [39]. During this process the user's intentions,

represented as statem ents of the task language, are m apped as input com mands which,

after execution by the system, are observed and evaluated by the user. If the user's

intentions cannot be completed in a single cycle of interaction, other related cycles are

introduced. The additional cycles are viewed as refinements of the intended task to be

realized. The framework proposed by [4] relates articulation and performance to the execution semicycle and presentation and observation as elements of the evaluation

semicycle. As defined by this approach, the interactive cycle begins with the USER

by the form ulation of a goal, and a task to accomplish the goal. This approach is also

based on the assum ption th a t the only way the user can m anipulate the machine is

through the INPUT. For this reason, the task m ust be articulated within the input

language. A lthough Abowd’s framework assumes th a t execution and evaluation are not always alternating semicvcles, the model does not indicate the procedure to be

followed when the user's goals first require the knowledge of the system s’s s ta te as provided by the ou tpu t devices1. As illustrated in Figure 3.1 Abowd’s framework

establishes th a t the evaluation semicycle always precedes the execution semicycle.

Therefore following the path as defined by the arrows connecting the USER and the

O U T P U T com ponents, articulation , performance and presentation are identified as

1A typical scenario for this is a user interfacing with a display-based system which first prompts the user for input.


3 0

nonactive translations for activities when input devices are not involved.

The notion th a t the interaction cycle must s ta rt with the user bv the formulation

of a goal and a task is accepted in this thesis. However the user is free to either

m anipulate the system by means of its input devices or consult the system 's s ta te as

supplied by the output. The following section proposes an additional translation to

Abowd’s framework as a way of approaching this nondeterm inistic behavior.

3.4 A Different Structure for Interactive Systems

This section introduces an additional translation to the framework proposed in [4],

The inform ation made available to the user by the system 's ou tpu t devices is now

structured as a process composed of two phases, consultation and observation. By

consulting the ou tpu t provided by the system, the user obtains the available require

ments to continue h is/her activity. These requirements are viewed as conditions from

the system 's perspective and as possible modifications to the task to be performed

from the user’s point of view. The modifications may be as simple as the addition

of an ex tra interaction cycle or as complex as requiring the complete task to be re

structured. As an example of this, consider the scenario where a client of a bank tries

to withdraw cash from h is/her account by means of an autom atic teller machine. If

the system is in a sta te which displays an out of order message, the client has to

modify h is/her goal/task pair because h is/her intentions could not be expressed by

the system 's interface at th a t particu lar instance. Consultation is therefore viewed

as a translation which maps the user’s expectations to the system ’s s ta te as supplied

by the ou tpu t devices.

3.4.1 A New Framework

A new framework for interactive systems based on the work developed in [4] is intro

duced here. The proposed framework differs from the model of [4] by the introduction

of an additional translation which supports the consultation of the system ’s sta te by

the user through the ou tpu t devices.

The notion of interactive cycles is understood as sequences of com ponents connected by translations. The sequences represent the derivation of words of a language defined


.31

by all possible tasks which can be realized through the system bv means of the

interface. The results obtained by the derivation procedure represent the user's tasks

th a t have been completely realized by the resources available. These characteristics are represented by the right-linear gram m ar

G = (N , T, P, B)

with

N = { U , I , S , 0 }

T = { c , a , p , v , o }

P = { U cO \ a l \ e, I pS , S vO, O -> oU }

and

B = U

where U , I, S and O are short forms for USER, INPUT, S Y S T E M and O U T P U T

respectively, and c, a, p, v and o are representations for consultation, articulation,

performance, presentation and observation respectively.

G ram m ar G is nondeterm inistic. This characteristic relates to the need the user may

have to analyze the ou tpu t in order to decide the next action to be taken. During the analysis process the user may refine or even redefine the m ental model he/she

has developed. A lthough regular languages can be graphically represented by the

standard sta te transition diagrams, sta techarts [52, 53] will be used. The reason for

this choice is due to the fact th a t hierarchical structures are be tte r visualized when

represented by these diagrams. The dynam ics of the proposed model is captured

by the sta techart in Figure 3.2. The statechart in this figure has depth two since

it structures the states in two layers or levels of abstraction. The higher level has

SYSTEM , IN TE RF ACE and USER as states. The lowrer level is a refinement of the

IN TE R F A C E s ta te and is composed of only two states, O U T P U T and I N P U T As it

can be seen lower case letters have been used to typeset both the names for languages

and translations. Each language has been placed inside the box where its related s ta te nam e is located.


32

FRAMEWORK

SYSTEM USER

t s k

• “ — crtk aU

inpa t

Figure 3.2: The Proposed Framework for Interactive Systems.

3.5 Example

Consider the scenario where a client of a bank fails to w ithdraw cash from an Auto

m atic Teller M achine (ATM) because he/she has forgotten the required bank card.

The client/A TM interaction, for this case, may be described by the following tasks:

• Consult s ta te of ATM by reading inform ation provided by its display, and

• In terpret inform ation from display.

It is during the Interpret information from display task th a t the client realizes the

adequate bank card m ust be supplied. Not having the needed card, the client stops

the cash w ithdraw activity and consequently the client/A TM interaction term inates.

This activity may be expressed by the framework proposed in this chapter bv the regular expression (co)*. The transitive closure is used in this case to indicate the

client's necessity to cycle through consultation/observation zero or as many times as

he/she feels it is necessary.

The regular expression ((co) + (apvo ))* represents all possible interactions the user

may have with the system. The term (apvo) represents interactions th a t involve both execution and evaluation semicycles. This characteristic is present in both the

framework proposed here and in Abowd's framework. The term (co) involves only interactions th a t include the evaluation semicycle. This characteristic is not included in the Abowd's framework.


3 3

3.6 Summary

A framework for interactive systems which is based on the model defined by Abowd [4]

is introduced in this chapter. The proposed approach uses an additional translation

as a way to support the necessary user analysis of the system 's sta te as supplied by the

ou tpu t devices. The complete cycle of interaction is modeled as a regular language.

A graphical representation of this organization is provided in a sta techart format.


3 4

Chapter 4

Authoring Environments

4.1 Introduction

The purpose of this chapter is to provide an understanding of the docum ent authoring process and to establish a context for the discussion of the quality of environm ents

used in the authoring of docum ents containing m athem atics. For this reason a set of

characteristics is considered in order to assess the quality of the environments. An

ideal environm ent is proposed and design approaches which may be used in order to achieve them are presented.

4.2 Interaction Objects and Authoring Environments

This thesis considers com puter-based docum ent authoring as an interactive process.

During this process the au thor m anipulates docum ents by means of interaction objects

as defined in C hapter 3. These objects can be m anipulated directly or indirectly by

the user. A docum ent authoring environm ent is a combination of interaction objects

and is structured according to the form of control the au thor has over the interaction objects involved.

Consider a pen/paper docum ent authoring environm ent for instance. In this organi

zation, the au thor uses the pen to record inform ation on the paper. This environment is characterized by the fact th a t all objects involved are directly m anipulated by the


3 5

author. The interaction is completely under the au thor's control because all infor

m ation printed on paper results from direct actions performed by the au thor on the

interface objects. To illustrate the notion of a docum ent authoring environm ent con

sider, for instance, a docum ent such as a research report w ritten in English. Table

4.1 provides a description of the pen/paper environment according to the interaction

framework proposed in C hapter 3.

USERtask

A uthorProduce a handw ritten draft of a research report in English

articulation Hand movements associated with handw ritingIN PU Tinput

pen /paper pen strokes

performance cursive writing

SYSTEMcore

P en /p ap er tex t authoring W ritten text

presentation Rendering of cursive w ritten text on paperO U T PU Toutput

PaperSets of handw ritten cursive characters printed on paper

observation Sets of handw ritten tex t according to the form atting style defined

consultation Interpretation of the cursive sets of characters based on the English syntactical and sem antic definitions

Table 4.1: P en /p ap er authoring environment.

A lthough in com puter-based authoring environm ents the au thor directly interacts with physical objects such as keyboard and mouse, the expected result, when avail

able, is also dependent on objects of an indirect form of control. Software and hard

ware com ponents not available for m anipulation by the system 's users are considered here as indirect objects. Table 4.2 presents the description of a T^jX-based environ

ment for the research report authoring task.

Document authoring environm ents which make use of no object of indirect interaction form of control are referred to here as direct or ideal environments. All other environm ents are considered indirect.


3 6

USER : A uthortask Produce a draft version of a research report in English

using plain macrosarticulation Hand movements associated with typingIN PU Tinput

Keyboard Key strokes

performance Key decodingSYSTEMcore

All related hardw are not directly accessed by the au thor Plain T£X macro package plus operating system used

presentation T^jX compiler plus dvi viewerO U T PU Toutput

Video displaySets of characters rendered on the display

observation Reading the displayed tex t according to the characteristics of the dvi viewer

consultation : In terpretation of the sets of characters based on the English syntactical and sem antic definitions

Table 4.2: T^X-based authoring environment.

4.3 Cognitive Distances

The characteristics of the authoring environm ents as presented in Tables 4.1 and 4.2

show th a t there exists a body of knowledge th a t the au thor is required to know in order

to accomplish any established task successfully. An im portant characteristic related to

the com m unication between USER and S Y S T E M is the difficulty the USER may have

in m apping intentions into physical commands of the input language. This difficulty

is referred to as the gulf of execution [76, 56, 39]. A nother relevant characteristic of

interactive systems is the difficulty the user has in interpreting the available output. This difficulty is called gulf of evaluation [76, 56, 39]. Both the gulf of execution

and the gulf of evaluation are results of design decisions th a t are usually related to

restrictions imposed by the specification of the interactive system. These gulfs are

viewed, by the user, as distances to be bridged in order to realize tasks successfully through the provided interface.

For ideal environm ents such as the p en /p ap er one, the knowledge needed to bridge

both gulfs is not relevant if we assume the au thor already knows how to read and


3 7

write. Com puter-based environm ents usually require additional knowledge. To au

tho r docum ents using a T^X-based system, for instance, a user should, a t least, know

the basics of both I^ X and the underlying operating system in addition to typing and

reading from a display screen. The cognitive distance to be bridged in this scenario

will depend not only on the m anipulation of the physical objects which compose the

interface; bu t it will also relate to the user’s knowledge of the tool used for typesetting.

As stated in [4, 76, 56], semantic distance relates the translation between the user's

intentions and the meaning of the interface language. This distance is a function of

both the expressiveness and the conciseness of the input language. Expressiveness

relates to the scope or sem antic coverage of a language. Ideally, highly expressive

languages provide support for the representation of all concepts in the domain in

which the language is intended to be applied. Conciseness relates to the mapping

the language provides to link tasks to the input syntax. Highly concise languages

are structured in a way to capture the sem antics of tasks, in the language's domain,

by syntactically simple statem ents. The macro package, for instance, is highly expressive but it is not concise.

4.4 Rendering Information

Inform ation exchanged in hum an-to-hum an communication is usually inaccurate and

unclear. For this reason different forms of inform ation exchange are usually neces

sary. In natu ra l language com munication, for instance, we often use gestures and

vocal sounds not related to the language as an a ttem p t to improve the transfer of inform ation. However, in user-com puter interaction the acknowledgment of a mouse

click may be reported by both a display change and a sound signal. In this case the user-com puter interaction is enhanced by the provision of feedback to the click

action in two distinct modes. As another exam ple consider the flight boarding an

nouncem ents th a t are usually made in most a irports through video term inals and

speech. In the described scenarios the additional modality may be viewed as a form of redundancy th a t enhances the quality of the inform ation transfer process.

Central to the use of m ultim odality as a form of communication enhancem ent is the notion of semantics-based inform ation organization. This form of structuring d a ta is understood as fundam ental in designing systems to be used for broadcasting inform ation. It establishes th a t the d a ta to be supplied to the m odality Tenderers


38

m ust be free of ambiguities. If the docum ent to be processed includes m athem atical

concepts, the ambiguity-free requirem ent does not allow the representation of different concepts by syntactically overloaded symbols.

4.5 Encoding Mathematical Concepts

M athem atical concepts need to be encoded in some form in order to be m anipulated.

The conventional m athem atical notation is, most of the tim e, the first encoded form

of these concepts we are exposed to. A lthough this general-purpose notation has been

the prim ary tool used for the teaching of m athem atics, it is not an adequate notation

to support the electronic com munication of the concepts.

As a visual system, the conventional m athem atical notation relies not only on a set

of symbols as a way of representing concepts, but it also makes use of spatial ar

rangem ents, variations of both font size and type, and other visual markers to aid

the representation of inform ation. These visual markers provide an efficient way to

represent a complex set of constructs by means of a lim ited set of symbols. This

characteristic is illustrated by the following two examples:

Exam ple 1: The convolution of two functions could be defined as follows:

If

£(/(/.)] = F{s)

and

£[»(<)] = G(»)

then the inverse product F( s ) G( s ) can be obtained in term s of f ( t ) and g(t) bv the expression

t£ _ 1 [F(.s)G(s)] = J f ( x ) g { t - x)dx

o

In the example above the change from lower case to upper case letters has been used

to indicate the domain change from t to s. The syntax used enforces the fact tha t

F( s ) is ju s t a different in terpretation of function / ( / ) . The Laplace transform ation as well as its inverse are represented by the character C which is the character L typeset

in a different way. The integration equation has its upper lim it t placed above its


3 9

lower lim it to inform the reader where the operation starts and ends.

E x a m p le 2 : The m atrix equation

Lx = m

represents a system of linear equations and has

x = L_1m

as a solution. The linear equation

lx — m

with x and m as real numbers, has

x = r lm

as a solution.

A lthough both solutions are obtained by means of taking the inverse of the object

th a t prefixes the variable we want to solve for, and then m ultiplying this result by the object on the right side of the equality, the semantics attached to these operations is

not the same. This fact is represented by the use of upper case and bold face type in the m atrix equation.

The necessity of representing m athem atics by means of encodings th a t support elec

tronic com m unication of the concepts, has m otivated the creation of other notations. Perhaps the most intuitive approach is to m ap all dimensions involved in the standard

representation of the concepts into a single dimension. A lthough conceptually trivial,

this linearization procedure allows the com plete domain to be input into com puter

systems. One relevant aspect of this approach is the structure used for capturing the

meaning of the m athem atical concepts. Such structure should supply the au thor with

the necessary means to encode not only all existing concepts, bu t it should also be capable of supporting the encoding of concepts proposed by the author.


40

4.6 Environment Modifications

The fact th a t the au thor relies on the interface to obtain the behaviour defined bv

the core as proposed in C hapter 3, may be used to represent docum ent authoring environm ents by the following pair

V = (S, I) (4.1)

where V , S, I are docum ent authoring environm ent, docum ent instance structu re and

system ’s interface respectively. This representation may be viewed as a refinement of

the framework proposed in Section 3.4 to address the details involved in the S Y S T E M

component. For com puter-based docum ent authoring environments, this s ta te needs

to be further decomposed in order to isolate the operating system ’s services from the behaviour provided by the docum ent structure. Figure 4.1 illustrates the framework

FRAMEWORK

SYSTEM

OPEATOIGSYSTEM

output

USER

ta ik

articalatif

input

Figure 4.1: Framework for docum ent authoring environments.

proposed in C hapter 3 where the S Y S T E M component has been modified to support

the proposed refinement. In this case core has been replaced by two lower level s ta te s1,

the operating system and the docum ent structure.

Consider, for instance, a com puter-based authoring environment Io = (So, Io) such as the one defined in Table 4.2. In this case S 0 represents the plain TgX macro package and Io is the complete interface part of the environment.

d e ta ils of the communication between these two states which are irrelevant to the present discussion have been omitted.


41

The replacement of the keyboard device by a m ouse/display pair, for instance, would

require articulation, input and performance to be redefined. A lthough the au thor may

acknowledge a significant am ount of change due to the mouse pointing and clicking

actions th a t replaced the typing form of m anipulation, the basis of the docum ent

structu re has not been modified. The resulting environment can be represented as

i = (Soi A) where I\ is the modified interface. Replacing of the plain T^X macro package by DT^X, for instance, will not have any effect on other parts of the environ

ment besides the docum ent structure. This means the au thor will use the keyboard

for input, but is now required to have knowledge of fXT^X to express h is /her ideas.

This environm ent is represented by V2 = (Si, To) where S i is a docum ent structure

based on the Dlj^X macro definitions.

Different docum ent authoring environm ents may therefore be obtained by the following three approaches. One can either:

1 . m aintain the docum ent structu re and modify the system 's interface, or

2 . m aintain the system 's interface and modify the document structure, or

3. modify both.

4.7 Changes in the Interface

Environm ent modifications as discussed in the previous sections do not include the

reasons why the changes were considered. This problem is approached, in this section,

by exam ining w hat m otivates changes in the system 's interface.

U ser-com puter interfaces can be viewed as facilitators which provide services to users. These services are structured according to the characteristics of the inform ation ob

tained by users as the result of com m ands executed on the interface objects. The

services may include inform ation which is directly available through the functionality

provided by the operating system. They may also involve concepts defined by the

s tructu re of the application, in which case, the user interacts w ith the application and

the operating system is viewed as a m ediator. In both scenarios, the dialogue between user and com puter may be structured according to the way interaction 2 resources are organized.

2 According to [74] interaction styles are key-modal, direct-manipulation and linguistic.


42

In cases where the operating system is a m ediator, it is possible to represent the ser

vices provided by the application as interaction objects. A lthough the application 's

functionality may be preserved by this procedure completely, the user may not be able

to access to the structure of the application. This side effect is sometimes intentional

since hiding the internal structure may improve the use of the application for inex

perienced users. As an example, consider an authoring environm ent which has plain

T^jX as docum ent s tructu re and uses a keyboard/display arrangem ent as interaction

device. The au thor in this case is forced to directly m anipulate the objects defined

by the T£jX macro package. The replacement of this type of interaction by one based

on a m ouse/graphical display com bination with the necessary macro package objects

structured as sets of icons, for instance, would allow the use of the package with no

other knowledge besides the m anipulation of the interaction devices. Modifications

to the system ’s interface such as this are usually performed as an a ttem p t to improve

the usability of the docum ent authoring environment.

4.8 Recommendations

In the previous sections the basic characteristics which document environments should

have in order to support the authoring of m athem atical concepts have been discussed.

These qualities are presented in term s of properties and indicate possible software

design approaches th a t may be considered in order to achieve them . Ideal docum ent

authoring environm ents are viewed as software systems which support the properties

listed in Table 4.3.

4.9 Summary

The framework for interactive systems proposed in C hapter 3 was extended through

a refinement of the S Y S T E M state. The modification introduced a lower level of ab

straction composed of two states. This approach aims a t a separation of functionality

between the operating system and the docum ent structure. A set of properties which may be used to assess the quality of docum ent authoring environm ents designed to support the representation of m athem atical concepts has also been introduced.


43

PR O PER TY DESIGN APPROACH

High Conciseness - Layer/processor addition to existing docum ent structu re definitions.

- Improve interaction style.High Expressiveness - Scope enhancem ent by the use of m eta-structures

and extensibility operations.Ambiguity-freeness / M ultim odality

- Enforcement of syntactically unique representations by the creation of domains.

Extensibility - Introduction of operations to update the docum ent structure.

Table 4.3: D ocum ent authoring environm ent characteristics and software design approaches to help achieving them .


4 4

Chapter 5

M athem atical Constructs and their Representation

Document authoring is an incremental activity in which a set of interm ediate (draft)

versions of a docum ent are produced by the au thor prior to the creation of the final

one. Any given version of a docum ent, except the first one, may therefore be viewed

as the result of an update of the previous version of the docum ent.

A uthoring docum ents th a t contain m athem atics or authoring m athem atics for short,

is both increm ental and dynam ical. It is during this activity th a t the author makes explicit the syntax th a t will represent the m athem atical concepts included in a given

version of a docum ent. The design of docum ent structures to support these char

acteristics m ust therefore include mechanisms to manage both the update and the

m eaning-to-syntax bindings determ ined during authoring.

This chapter introduces the notion of using CFGs as a m ajor formalism to support

the dynam ics of authoring m athem atics. It discusses the use of CFGs as a tool

to capture the semantics of m athem atical concepts by means of user-defined syntax

th a t can be proposed during authoring. The lim itations CFGs have in supporting

docum ent structures th a t allow update are also addressed and an overview of the

solution proposed by this thesis to approach these lim itations is presented. A set of examples illustrating the possibility of using CFGs to capture the semantics of m athem atical concepts is provided.


4 5

5.1 Notational Systems as Languages

A notational system uses a set of symbols to describe quantities and ideas and it

is used as a supporting mechanism for the expression of ideas. A program m ing lan

guage is a special notational system designed to solve problems in a particu lar domain.

This characteristic often establishes the set of basic constructs th a t will provide the

language with the necessary power to approach the tasks in the specified domain.

Language constructs are generally structured around statem ents, and these program

ming statem ents are, most of the time, characterized as block statem ents, flow control statem ents, expressions, and declarations.

This way of structu ring the design of a program m ing language leads to the idea th a t

the language can be defined as a set of basic modules th a t can be combined to generate

other modules. The task of a module design may be accomplished through the use of

a Context-Free G ram m ar, which will thereafter be referred to as CFG in this thesis.

CFGs have been used as a m ajor tool for the specification of program m ing languages.

The im plem entation independence of this approach, provides the designer with the

flexibility to work on the development of a language w ithout the need to be concerned

with im plem entation details. Program m ing languages often need to be m apped into

other domains in order to be tte r respond to the user processing requests. Compilers

are well known tools th a t support the translation of language definitions into other forms.

CFGs are, in this thesis, viewed as abstract type definitions, and sentences belonging

to the gram m ar as variables of th a t type. This idea is supported by the fact th a t,

given a set of basic type definitions or a set of CFGs, other definitions can easily be

produced by the m anipulation of the rules already defined. The parsing process of a compiler can therefore be interpreted as a type checker which only verifies w hether a given variable (a sentence) belongs to the set provided by the type definition (the

gram m ar). This analogy can be further extended to include abstractions such as

the possibility of reuse of well defined gram m ars in the design of other program m ing

languages.

A lthough some notational systems are not designed to support program m ing, they can

be structured in a way sim ilar to program m ing languages. The standard m athem atical notation system is one example of such systems.


4G

5.2 Standard Mathematical Notation Characteristics

The representation of m athem atics by a finite set of symbols imposes restrictions on

the notation used. In the following section, the implications of this lim itation are

addressed, and the need for a form of representation based on semantics is discussed.

The field of m athem atics is composed of a collection of subfields or domains. The

various branches of science often make use of these subfields as supporting tools to ex

press their ideas. For instance, the formal presentation of some electrical engineering

concepts is supported through the use of calculus.

To develop an understanding of the m athem atical notation, classes of m athem atical

concepts can be defined, and a trivial one-to-one m apping between these classes and

the subfields can be established. A bstract m athem atical constructs are m apped onto

concrete symbols in order to provide humans with the representations necessary to

com m unicate m athem atical ideas as well as concepts. The m athem atical notation

can therefore be viewed as a language used to describe the abstract concepts.

Despite the fact th a t humans depend on concrete objects for sharing their knowl

edge, all m athem atical com putations rely on the ability to m anipulate the abstract

concepts involved. Like natural languages, the m athem atical notation has its basis in

a dynam ic process where an abstract idea can be represented by different language

constructs, and the inform ation conveyed by a particu lar language construct may

relate to different abstract concepts. This m any-to-m any m apping between abstract

concepts and language constructs characterizes this dynam ic process as ambiguous

and incomplete. Therefore any particu lar abstract m athem atical concept is said to

be represented by a notation construct if the parties involved in the inform ation ex

change have previously agreed on the notation defined for the concept. This leads to

the conclusion th a t this representation process not only is unstable, bu t also imposes

the characteristic of being locally redefined. The derivative of v w ith respect to t.

for instance, is a good example of the representation am biguity of a m athem atical concept. E ither w, v' or ^ could be chosen to illustrate the concept1.

The representation of various m athem atical concepts is usually accomplished bv over-

'The form of attachment where one concept is accessible by more than one reference is here denoted as aliasing.


4 7

Representation

derivative dv'dt

conjugate

complement

Figure 5.1: M anv-to-many relationship between m athem atical concepts and their representation.

loading meaningful symbols. The arithm etic mean, the conjugate of a complex num

ber as well as the complement of a boolean expression are well known concepts th a t

are often represented by placing a horizontal bar over a variable name. For instance,

variable v could be chosen to represent all three concepts. It is clear th a t context

has to be included in any a ttem p t to com m unicate m athem atical concepts. It is during authoring th a t the relationship between m athem atical concept and concept representation is available for modification. Selecting a particu lar syntax represen

ta tion may therefore not only determ ine the m eaning of a concept but it may also

indicate the dom ain where the concept is defined. Figure 5.1 illustrates the many-

to-m any relationship between m athem atical concepts and their representations. The

representation flexibility illustrated by the examples presented restricts the m anipu

lation and understanding of the m athem atical notation to users who share a common understanding of the term inology applied.

As science progresses, the new ideas proposed, as well as the necessary support

ing assum ptions, need to be fully described. This condition places the extensibility

requirem ent on the notation used to express the results obtained. M athem atical sym

bols will need to be provided in order to precisely describe the new concepts and new

syntax may therefore need to be introduced as a way of avoiding am biguities. Exten

sibility is frequently used in m athem atical notation to either locally define symbols

or to represent new concepts. The following scenario illustrates this characteristic.

Consider the scaling of a plane defined by the following statem ent:


4 8

Assume (x, y, z) are given C artesian coordinates. We now let (x, y , z) be new coordi

nates where x = Ax, y = Ay, z = \ z and A is a positive scalar constant.

In the context described, x, y and z are neither complex conjugates, the complements

of boolean expressions, nor the means. A new interpretation has locally been pro

vided to the variables. The extensibility characteristic of the m athem atical notation

increases the level of complexity involved in capturing the semantics of the concepts

presented.

The representation of m athem atical notation can be achieved bv either a presenta

tional approach, in which the visual characteristics of the symbols used in the notation are emphasized, or by a sem antic approach, where abstract concepts are used as a

basis for the representation. The presentational approach was introduced during the

early stages of com puters. Typesetting systems like nroff/troff as well as T£X are ex

amples of such systems. A lthough both systems provide stable d a ta representations,

they lack the necessary features to be used as a basis for the representation of da ta

in forms other than text. In contrast, as argued in [11], a notational approach based

on the m eaning of symbols, th a t is, based on the semantics of the concepts is needed.

One of the difficulties presented by the representation of m athem atical expressions by

their contents is to capture the m eaning of the concepts. A nother way of expressing

this characteristic is to capture the m eaning which has been associated with a given

set of symbols in case the concepts have already been encoded as these symbols for

com munication. For this reason the representation of m athem atical concepts by the

sem antic approach has not yet been im plem ented in totality.

5.3 Capturing the Semantics of Mathematical Concepts

The use of CFGs as a formalism to support the capturing of m athem atical concepts is

discussed in this section. Both its advantages as well as its lim itations are addressed.


4 9

5.3.1 M athematics and Document Authoring

The representation 's lifetime of a m athem atical construct in a docum ent may be char

acterized as a variable th a t denotes a locally pre-established relationship between the

abstract concepts involved and a user-defined interpretation. Syntactic constructs may tem porarily be bound to specific meanings as the result of a process led by the

au thor of the docum ent in order to com m unicate h is/her knowledge. Therefore this

context-dependent binding process is the mechanism the au thor has to express infor

m ation by means of a finite set of symbols. By fixing an in terpretation for a given

syntax for a period of time, the au thor expresses h is/her knowledge at the possible

cost of introducing symbol overloading 2 and syntax ambiguity. This process may be

interpreted as context switching, where the au thor has the power to assign different

in terpretations to the set of symbols used for the representation of the m athem atical

concepts. Therefore am biguities introduced by the editing procedure have the au

th o r’s approval and control. They are part of the document because they express the

result of an already accepted form of representation.

The fact th a t the au thor is allowed to attach different meanings to syntactical struc

tures, adds a complex com ponent to the problem of capturing the semantics of m athe

m atical concepts. This characteristic introduces the idea of real-tim e docum ent struc

ture update. This means the structu re of the docum ent is modified during authoring

to include adequate syntax to capture the semantics of m athem atical concepts. For

this reason a semantics-based docum ent authoring model is necessary. This form of

authoring docum ents is formally defined in C hapter 6 and it will thereafter be referred to as dynam ic authoring.

Modeling structures to support dynam ic authoring requires mechanisms to support

the semantics capturing of the m athem atical concepts. This translates to the need of

addressing not only d a ta representation issues, bu t it also indicates th a t the context

in which m athem atical concepts are represented need to be considered.

The docum ent update notion, imposed by the dynam ic authoring model, establishes the necessity of well-defined mechanisms for both accessing and modifying the struc

tu ral base upon which the docum ent’s syntax and semantics are represented. In the

2In this context, symbol overloading is viewed as part of an incremental updating process where existing connections between mathematical concepts and syntactical constructs are modified. The modification process either establishes or keeps a many-to-one relation between mathematical concept and syntactical representation.


50

case of having a gram m ar as the supporting structure for capturing the semantics of

m athem atical concepts, a modification of either the syntax used for the representation

of concepts3, or the introduction of a new construct, will require an update process

in which the related gram m ar definitions will need to be adapted according to the

modifications proposed.

It is during the authoring activity th a t syntax is bound to concepts. In the event

th a t am biguities are introduced by symbol overloading, authoring mechanisms can

be provided to resolve all context-dependent representations which, according to the

author, need to be included in the docum ent.

5.3.2 CFGs and Data Types

The type of m apping between m athem atical concept and gram m ar representation

determ ines the degree of dependence between the two domains. If this dependence

is established by a one-to-one m apping (every m athem atical concept is captured bv

an isolated gram m ar definition), then modifications proposed by the au thor will be

reflected only in the production rules involved in the definition of the concepts m a

nipulated. This organization is supported by the software engineering principle of

separation of concerns which approaches a complex problem by concentrating on

each individual aspect of the problem one a t a tim e [48]. The m odularity necessary

for the application of this principle is obtained by assigning the set of production

rules th a t define a given m athem atical concept a unique context-free gram m ar. In

this thesis these structures are referred to as gram m ar fragments, modules or simply fragments.

A lthough a module is viewed as a syntactic concept which only affects the wav in

which software tex t is partitioned [72], sem antic restrictions on the associated text

may be used as criteria for m odularization. For instance, for m athem atical concepts

sharing the same syntactical structure when presented according to the conventional

m athem atical notation, their sem antic content is the only characteristic which may

be used to identify them. Therefore m athem atical concepts w ith different semantics

3This may seem contradictory since the semantic characteristics of a concept are not affected by the form in which it is rendered. The semantic information attached to the standard visual presentation of a concept will, most of the time, be included during the associated capturing procedure. A typical illustration of this characteristic is the juxtaposed multiplication in polynomials which is discussed later in this chapter.


51

and the same syntax are understood as distinct objects. For this reason they should

be treated separately bv means of the ir own gram m ars.

One advantage of having CFGs as the fundam ental structure to capture the meaning

of m athem atical concepts, is the flexibility this mechanism provides in supporting

both the design and recognition phases of the capturing activity. The design phase

is characterized by the assignment of gram m ar fragments to m athem atical concepts.

D uring recognition, the input provided by the au thor is subm itted to the analysis

component of the associated language processor. At this stage, the input is encoded

as tokens and its syntactical structu re is matched against the related set of production

rules th a t has been provided during the design.

A nother way the recognition phase may be viewed is as the execution of a membership

verification performed by the analysis com ponent. For this in terpretation a CFG is

equivalent to a data type or ju st type and each valid input is an instance of the type.

This association is consistent with the notion of type provided by [72]. The da ta type,

in this case, is represented by the s ta rt symbol of the CFG.

The organization proposed in this section, merges the notions of module and type

by using CFGs as sta tic structures to support the semantics capturing requirement.

One benefit of organizing m athem atical concepts as sets of gram m ar fragm ents or

modules is the possibility of using both decom position and composition as aids to the

structuring process.

A lthough is is possible to capture the meaning of m athem atical concepts by means of sta tic structures such as CFGs, this approach presents lim itations. One im portant

lim itation is th a t CFGs only support the definition of docum ent interchange formats. This means CFGs do not support the fundam ental requirement th a t authoring m ath

em atics is a dynam ic activity in which the bindings between meaning and syntax are

established by the au thor while m anipulating the document. A discussion involving

this characteristic is presented as follows.

5.3.3 CFG Limitation to Support Authoring M athematics

This subsection illustrates the lim itation CFGs have in supporting the semantics cap

turing of m athem atical concepts. For this purpose consider, for instance, authoring


52

a docum ent which includes an expression involving the addition of integers such as

1 + 0 = 1 (5.1)

The m eaning of expression (5.1) may be captured by the CFG rules in Table 5.1.

This means + is the addition of integers, 1 and 0 are integers and = is equality.

add equality left_expr = right_exprleft_expr left.expr + right_exprleft.expr integerright_expr integerinteger 1 1 o

Table 5.1: CFG rules for addition of integers 0 and 1 .

Assume the au thor decides to update the current version of the docum ent by including

another expression. This expression contains the Boolean O R operation which is

represented by the + symbol, and TRUE and FALSE values represented bv integers 1 and 0 respectively. An example of such expression is

1 + 0 = 1 (5.2)

The syntax of expression (5.2) can be captured by the gram m ar in Table 5.1. However

its semantics cannot. This is because the au thor has determ ined th a t the context in

which this syntax is valid has changed. O perations on integers have been replaced by

operations on Booleans, 1 means TRU E and 0 means FALSE.

CFGs provide no means of updating the ir production rules. Therefore a docum ent

s tructu re based on this formalism has to include a mechanism to support the ability to

respond to authoring requests aimed at the creation of context-dependent meaning-

to-syntax bindings.

5.3.4 Updating CFGs

This thesis approaches the semantics capturing problem by means of an organization

th a t is based on sets of modules. A standard library is used as a storage facility


53

where a basic or default set of modules is placed. The modules required by a given

docum ent, at any instance, may be obtained from the set of default ones or from

the result of a composition procedure th a t may either include only modules taken

from the library or, if necessary, completely new ones. In this work the ability to

m anipulate the set of modules in order to update the available types is introduced as

the means to produce docum ent structures th a t support extensibility.

As has been shown previously, docum ent structures based on static organizations

such as CFGs do not support the dynam ic extensibility as required during the au

thoring process. It is during this stage th a t the au thor is free to select the syntactical

arrangem ent necessary to represent each m athem atical concept th a t takes part in

the docum ent. Mechanisms th a t support run-tim e update are therefore needed when

considering the design of structures to handle the dynam ic authoring of m athem at

ics. For this reason the following mechanisms need to be included when considering

the design of docum ent structures to manage the dynamical binding of m athem atical

concepts to syntactical constructs:

1 . increm ental update, and

2 . module reuse.

Central to the effective use of update as a process to modify d a ta are the notions of

identity, redundancy control and norm al forms [43]. For CFGs, the notion of norm alization as a process to elim inate update anomalies requires the identity verification

of gram m ar rules. This includes verification of both the syntax and the semantics

of the rules. This requirement is necessary because rules th a t have identical syntac

tical structu re do not express the reasons why they are intended for. The m eaning

attached to nonterm inals on the rhs of a rule depends on gram m ar rules th a t define

these nonterm inals. Therefore semantics of a CFG rule is a concept th a t involves

the notion of rule dependency. For this reason identical syntactical structure in CFG

rules does not guarantee identical semantics.

Exception to th is characteristic are rules th a t have a single term inal and no nonter

minals on the ir rhs. These rules have both their syntax and semantics determ ined by themselves. Syntactical identity, in this case, determ ines semantical identity.

A discussion regarding the involvement of identity, redundancy control and normal

forms in docum ent structures th a t use CFGs for the semantics capturing of m athem atical concepts is presented in the rem ainder of this section. The following subsection


54

provides examples to illustrate the problems introduced when gram m ar rules sharing

the same syntax are used to capture different semantics.

5.3.4.1 Identical Syntax and Rule Semantics

O bjects are identical if they are indistinguishable. This suggests th a t indistinguish

able gram m ar rules should be viewed as identical. This characteristic is addressed as

follows by an example in which a set of rules is shared by three different gram m ars.

Consider, for instance, CFG rules containing nonterm inals in their rhs. This indicates

th a t such rules depend on other rules in which the definitions of these nonterm inals

are provided. For this reason it is sometimes not possible to determ ine the semantics

of gram m ar rules before all nonterm inals have been replaced by term inals. To illus

tra te this characteristic consider, for instance, the CFGs defined bv the production

rules in Tables 5.2 and 5.3 . The fact th a t rules 1 and 2 from both gram m ars are iden-

catexpr 1 expr expr -I- term2 expr term3 term integer4 integer 1 | 2

Table 5.2: G ram m ar for addition of integers 1 and 2

addexpr 1 expr expr + term2 expr term3 term character4 character a | b

Table 5.3: G ram m ar for concatenation of characters a and b

tical determ ine th a t both gram m ars define lists of terms separated by the + symbol.

Although they share this characteristic the semantics of both rules 1 and 2 depend

on the inform ation provided by rules 3 and 4. The derivation of a word such as 1 + 2,

for example, as provided in Table 5.4, illustrates th a t rule 1 from the gram m ar in

Table 5.2 defines lists of integers 1 and 2 that are separated by the + symbol. The


expr =+ expr + termterm + term

=> integer + term=> 1 + term=> 1 + integer=> 1 + 2

Table 5.4: Derivation of word 1 + 2.

expr => expr + termterm + term

=> character + term=f> a + term=> a + character=> a + b

Table 5.5: Derivation of word a + b.

fact th a t integers are being separated by the + symbol suggests th a t rule 1 , from this

gram m ar, captures the addition of integers concept.

In a sim ilar way, the derivation of a word such as a + b, as provided in Table 5.5, for example, shows th a t rule 1 from the gram m ar in Table 5.3 defines lists of characters

a and b separated by the + symbol. For this case it can be stated th a t rule 1, from this gram m ar, captures the concatenation of characters concept.

catexpr 1 expr expr + term2 expr term3 term integer | character4 integer 1 i 2

5 character a | b

Table 5.6: G ram m ar for operations on integers and characters.

The CFG defined by the rules in Table 5.6 combines rules 3 and 4 from both gram m ars defined in Tables 5.2 and 5.3. Table 5.7 illustrates th a t the derivation of a word

a + 2 , for example, determ ines th a t rule 1 from the gram m ar in Table 5.6 defines


56

expr => expr + term=> term + term=> character + term=> a + term

a + integer=> a -I- 2

Table 5.7: Derivation of word a + 2.

lists of characters a and b a n d /o r integers 1 and 2 separated by the + symbol. The

in terpretation for the + symbol is not determ ined because the semantics attached to

this symbol cannot be expressed by the gram m ar in Table 5.6.

The complete in terpretation for the + symbol, in this case, is not provided by the

gram m ar rules as it had been for the previous two scenarios. The reason for this is

because the semantics attached to this symbol cannot directly be expressed by the

gram m ar in Table 5.6. Additional inform ation, in this case, is necessary in order to

specify how integers and characters are to be processed by the + operator.

The gram m ars presented in this subsection illustrated the possibility of one rule being

used to express several different semantics. As has been shown, one CFG rule may

be applied to express many semantics. It is also possible to have the semantics of

a single concept captured by different CFG rules. This characteristic is discussed in the following subsection.

5.3.4.2 Redundancy, Syntax Equivalence and Normal Forms

In relational databases the idea of redundancy is related to the notions of identity,

functional dependency and normal form [43]. As a property of the semantics of the

a ttribu tes, functional dependency expresses relationships among attribu tes. There

fore it depends on a value-based notion of identity. Functional dependency is used in

determ ining the presence of redundancies in database schemas.

A normal form is a schema which has desirable update properties and does not contain

certain types of redundancies. A norm alization is a process th a t breaks down unsat

isfactory relation schemas according to norm al forms criteria. The schemas generated through norm alization are therefore said to be normalized.


As static structures, CFGs do not support dynam ic authoring. For this reason exter

nal mechanisms need to be designed in order to update the set of gram m ars involved

in the modifications proposed during authoring. Effective updates of these structures

require both the identification and control of redundant definitions.

CFG redundancy is, in the context of this work, defined in term s of gram m ar rules.

The examples presented in the previous subsection illustrated th a t CFG rules th a t

have identical syntax may be used to express different semantics. In the context

of this thesis, such rules are considered redundant. This form of redundancy will

thereafter be referred to as redundancy by syntax identity.

This type of redundancy can be detected by string comparison. Its control can be

obtained by a gram m ar update procedure th a t

1 . creates a new gram m ar for the redundant rule, and

2 . elim inates this rule from the gram m ar where it was identified.

A nother form of redundancy occurs when the same semantics is expressed by differ

ent CFGs. G ram m ars in this case differ due to nonterm inal renam ing (isomorphic

gram m ars). This form of redundancy will thereafter be referred to as redundancy by

syntax equivalence4.

The fact th a t isomorphic gram m ars have different nonterm inal sets implies th a t their

sets of production rules are also different. Since the s ta rt symbol of a gram m ar is

interpreted as a type, these gram m ars introduce the possibility of a ttaching different

names to a single type definition. A careful analysis, in this case, is necessary to

identify the scenarios where different types need to be defined. For this situation a domain specification needs to be provided in order to ensure th a t the type definitions are unique.

For any arb itra ry CFGs G x and G2, it is undecidable [55] w hether L {G X) = L (G 2).

Therefore there is no effective approach to identify redundancies by syntax equivalence. In other words this form of gram m ar redundancy cannot effectively be elimi

nated by operations performed on the structu re th a t supports the semantics capturing

4Isomorphic grammars produce equivalent abstract syntax trees for all words in the language they generate, therefore the same semantics is always expressed by them. For this reason redundancy by syntax equivalence is only identified if the grammars involved are isomorphic.


58

of the m athem atical concepts. For this reason the idea of norm alization as a process

to remove update anomalies is not considered.

W hen capturing the semantics of m athem atical concepts by means of CFGs, the

term inals of the gram m ar are associated with the names of the concepts. Hence

the set of dependencies th a t exist am ong the term inals of the gram m ar describes

concept dependencies which have been expressed by the rules of the gram m ar. C hap

te r 6 introduces the notion of gram m atical dependency which is a form of expressing

relationships am ong gram m ar rules. This characteristic is applied to express the de

pendencies which exist am ong the term inals of a gram m ar. The knowledge of these

dependencies identifies sets of gram m ars which may contain redundancies.

As it has already been illustrated in Subsection 5.3.4.2, the syntax and portions of the semantics of m athem atical concepts can be captured bv CFG rules. This approach

will lead to the creation of a set of gram m ars which will be used to support the

representation of the concepts th a t will take part in a given instance of a docum ent.

The creation of gram m ars may be accomplished either by means of editing procedures

or they may be generated as the result of operations th a t involve o ther gram m ars.

For both scenarios, the creation process will be simplified if an adequate gram m ar

form at is imposed. This form at is proposed in C hapter 6 as a normal form for CFGs.

Two different im plem entation aspects will benefit from this normal form. They are:

• the semantics capturing of m athem atical concepts and

• the com position of gram m ar fragments.

The need of a normal form for the gram m atical structure used in the semantics cap

turing process is to avoid definitions where the nonterm inal arrangem ent on the right

hand side of the production rules hides the meaning of the concept to be captured.

This problem is solved by the adoption of a set of tem plates th a t will enforce the construction of the production rules in a particu lar way in which the m eaning of the

m athem atical concepts could correctly be captured. These tem plates are the smallest

structura l com ponents th a t are allowed in the capturing of m athem atical concepts

by CFG rules. As restrictions on gram m ar rules they establish th a t the capturing

approach may need to decompose the abstract concepts. This is necessary to en

sure th a t concept com ponents are captured by gram m ar rules th a t follow the form at defined by the tem plates.


59

The com position process, by which gram m ar fragments may be combined to produce other definitions, should be free of any inform ation th a t is not necessary for

the successful completion of the desired gram m ar arrangem ent. This means the com

position process should not introduce definitions th a t carry redundant inform ation.

The following sections discuss the possibility of capturing the m eaning of abstract

m athem atical concepts by means of CFGs.

5.4 Representing Polynomials

The idea of expressing m athem atical concepts as language fragments is used here

as an aid to capture the semantics of m athem atics concepts. W ith this technique,

the definition of m athem atical concepts which contribute to the definition of other

concepts can be isolated and approached by gram m ar fragments. A composition

process will la ter combine all necessary gram m ar fragments as a way of representing

complex m athem atical concepts.

As it is composed of abstract concepts, m athem atics needs to be encoded in order to

be com municated. The encoding proposed by the conventional m athem atical no ta

tion is a representation form at th a t is usually used for com m unicating m athem atics.

A lthough this notation is used to support the discussions on the capturing proce

dure this thesis proposes, it is im portan t to emphasize th a t m athem atics is composed

of abstract concepts. For this reason encoding strategies are needed to support the

m anipulation of these concepts. For instance, a discussion involving a polynomial is simplified when this abstract concept is encoded according to the standard m ath

em atical notation. Consider, for example, the following identity expression, which displays the elements of a polynomial as its right hand side term .

k = abc + a2b2c 2 + . . . + anbncn (5.3)

In order to capture the m eaning of equation (5.3) by CFGs, the meaning of each concept th a t is included in this equation can be expressed by a gram m ar fragment. This

indicates th a t gram m ar fragments for equality, juxtaposed multiplication, addition,

power and additiomellipsis operations need to be supplied.

It is im portan t to observe th a t expression (5.3) used the ellipsis (continuation dots)

operation to express the repetitive addition of polynomial term s. This and_so_on


6 0

abstract concept means th a t the addition pattern th a t started with the first term of

the polynom ial continues and stops when the last term is reached. For this reason

this operation is captured as a addition-ellipsis5 binary operation.

9i term first juxtaposed otherother second juxtaposed th irdjuxtaposed efirst a powersecond b powerthird c powerpower superscript identifier

Table 5.8: CFG fragment for expressing words from G.

Consider the right hand side of expression (5.3) where the polynomial is defined. One

possible way to express this as gram m ar fragments is to consider each term of the polynomial as a word from the language G — {a kbkcfr | 2 < k < n } U {abc}. Table 5.8

92 polynomial polyexpr addition.ellipsis termpolyexpr polyexpr addition termpolyexpr term

Table 5.9: CFG fragment for expressing addition_ellipsis and addition operations.

illustrates one possible gram m ar fragment th a t recognizes the words defined by G.

The gram m ar displayed there captures both the juxtaposed multiplication and the

power concepts. The addition-ellipsis and addition operations may be captured by the gram m ar in Table 5.9.

In order to completely express equation (5.3) by CFGs, the equality concept needs to be considered. Table 5.10 provides a gram m ar fragment th a t captures this concept.

The com position of the three gram m ars displayed in Tables 5.8 to 5.10 produces a

CFG th a t recognizes equation (5.3).

5The expression 1 < 2 < 3 < . . . < 5001 states that each integer from 1 to 5000 is less than its successor. The andLso-on concept for this scenario abstracts the notion that the logical condition less than that applies to the first pair of integers continues until the last pair is reached. For this situation the a n d .so .o n operation would be captured as less .th a n -e llip s is binary operation.


61

93 expression leftside equals rightsideleftside ID EN TIFIE Rrightside polynomial

Table 5.10: CFG fragm ent for expressing equality operation.

A lthough G has been used to list all term s of the polynomial equation, its words may

also be applied to represent other m athem atical concepts. Consider, for instance, the

field of formal languages. In this case akbki* is viewed as a string of characters. The

semantics capturing process therefore should be based on the notion of considering

literal strings of characters as the syntactical s tructu re to be processed. A string such

as a 2 is therefore interpreted as the concatenation of a with itself which generates

aa. To capture the meaning of the words in G for this in terpretation a mechanism to

represent the concatenation of ak w ith bk concatenated with & needs to be provided.

For this reason, either ak, bk or c* is to be recognized by a structure th a t accepts the

concatenation of a character with itself k times.

As illustrated by the two different in terpretations associated with the syntax defined

by the words in G , the context in which concepts are expressed must also be taken

into consideration. As different in terpretations may be attached to any given syn

tax, a strategy to resolve the syntactical am biguities introduced needs to be defined.

The needed mechanism should be capable of supporting the capture of all possible

in terpretations associated with the syntax considered.

Exponents and indexing are concepts used in various fields of m athem atics. These

two concepts are usually described with the support of superscripts and subscripts.

The section th a t follows discusses a gram m ar approach to capture both concepts.

5.5 Representing Subscripts and Superscripts

Subscripts and superscripts attached to literal strings of characters can be viewed as modifiers th a t carry additional m eaning of a symbol. The semantics of both subscripts and superscripts may be captured by considering them as binary m athem atical

concepts whose argum ents are the base and the sub/superscrip t. They have right as

sociativity and the highest precedence am ong the other concepts.


6 2

9b words words SUB wordwords wordword word SUP indexword indexindex NUM BERindex ID EN TIFIE Rindex ( words )

Table 5.11: CFG fragment for subscripts and superscripts.

One possible gram m ar fragment to represent both subscripts and superscripts is pro

vided in Table 5.11. The production rules associated with superscripts follow the rules for subscripts in order to ensure the correct precedence for both operators.

Consider, for instance, the representation of where S is identified by means of an

index i which itself has both a superscript and a subscript . The following expression

represents the variable S in term s of the subscript and superscript concepts.

S sub(i s u b j ) su p k

A more complex example is provided as follows to illustrate the precedence charac

teristics of bo th superscripts and subscripts. The symbol = is used to represent the equivalence of the two forms of representation.

z qS J = (S sub(i s u b j ) su p k ) sup((z subp) supq) (5.4)

J

The use of CFGs to capture both subscript and superscript concepts do not express

the context in which these definitions are considered. One way to approach this requirem ent is to introduce a scope mechanism to delimit the context in which concepts

are expressed by unique syntax. C hapter 6 proposes a scope structu re to solve the syntax am biguity problem created by the overloading of the symbols used for the

representation of the concepts.


6 3

5.5.1 Overloading Subscripts

The need for structu ring m athem atical notation around a set of domains is em pha

sized here. Consider the recurrence relation an+i = 3an,n > 0,ao = 5 which has

an — 5(3"), n > 0 as its general solution. Also consider a two dimensional m atrix

defined as follows:

The \ 2 th term of the recurrence relation, a = 5(312), has the same syntactical

representation as the element of m atrix A located on the first row and second col

umn. Although these concepts share the same visual form, different in terpretations

are expected depending on the context in which they are presented. This context is

interpreted as a domain or subfield and may be as general as, say, Discrete M athe

matics or Linear Algebra. It may also be specific depending on the characteristics of

the concepts involved. By letting a 1 2 be part of a domain, the additional inform ation

necessary is supplied to determ ine its m eaning uniquely. This form of structu ring

m athem atics, by grouping knowledge into domains, will be used as a mechanism to

resolve am biguities in this thesis.

In the linear algebra domain, for instance, the syntax a,j represents the operation

th a t establishes the link which is used to locate elements in the A m atrix. The

need for an operator to represent the dimensional link expressed by the subscript

used for the location of m atrix elements, can be illustrated by the fact th a t 6 10o is

the 100^ element of a one dimensional m atrix. The in terpretation associated with

& 1 2 3 is not unique. If m atrix B is three dimensional, for instance, then 6 1 2 3 refers to one particu lar element in the structure. A one-to-one m apping between syntactical

representation and element location is not possible if m atrix B two-dimensional. Two in terpretations are associated with the syntax £>i2 3 in this case: either an element

located in row 1, column 23 or in row 12, column 3. This am biguity could be resolved

by the introduction of an operator to determ ine where the link between the dimensions

of the structu re is to take place. For instance bsub( 12 ,3 ) could be used to reference

the element located in row 12, column 3 in m atrix B.

A = (5.5)


6 4

5.5.2 Overloading Superscripted Symbols

Consider the overloading of both + and — symbols provided by the expressions below.

i/i = /+ + r (5.6)

and

/ = r - r (5.7)

9f function_parts positive_part — negative_partpositive.part functioned su p +negative_part functioned su p—functioned ID EN TIFIE R

Table 5.12: CFG representation of the positive and negative parts of a function.

In the above expressions / is a function, / + represents the positive part of / , and

f ~ the negative part of it [93]. The semantics attached to the + symbol in equation

(5.6), indicates th a t this symbol is used to represent two operations and each instance

of it aims a t the representation of a different concept. The superscripted instance

characterizes the unary postfix operation of taking the positive part of a function,

whereas the binary infix instance represents addition. The definition presented in

Table 5.12 illustrates a possible gram m ar fragment to represent both the positive and

negative parts of a function.

5.6 Representing Matrices

The representation of m atrices is usually done by means of a com bination of upper

and lower case letters. An upper case le tter is used to denote the m atrix itself and the

corresponding lower case le tter combined with lower case subscripts define both its elements as well as their location in the m atrix. S tructural concepts such as vectors and m atrices depend on the representation of lists since both vectors and matrices

characterize a collection of elements organized in a particu lar way. The gram m ar


6 5

fragment illustrated in Table 5.13 presents the necessary rules for the definition of matrices.

Qc m atrixrule MATRIX} dim list ( elist ) }dim list dim list : sizedim list sizeelist elist , elelist elel ID EN TIFIE Rsize NUM BER

Table 5.13: CFG fragment for matrices.

The following system of linear equations represented in m atrix form at is used to

illustrate the syntax defined by the rules presented in Table 5.13.

n (;;)=U)The syntax enforced by the rules provided by Table 5.13 is presented below:

M atrix{2 : 2 (3 ,1 ,0 ,3 )} • M atrix{2 : 1 (2 1 , 0 :2 )} = M atrix{2 : 1(4, —5)}

where the • symbol denotes m atrix m ultiplication operation. The operator , is in tro

duced as a way of representing the m atrix elements as nodes of a hierarchical relation between the entries of a m atrix.

The representation of the power, inverse and transpose of a m atrix by superscripted

symbols does not carry the necessary semantics of each individual operation. For this reason, it is necessary to represent each one of these concepts bv means of its own

semantics. A lthough the representation of the power of a m atrix is characterized by

a binary operation, both inverse and transpose are unary operations and are usually

recognized by syntactical structures in the postfix form.

M atrices with only one row or column can also be considered as vectors. The syntactical representation of these concepts is usually obtained by means of single lower case

letters typeset in boldface font. In this com pact form of representation, the operator

is identified not by means of symbols attached to the operands, but by the type of visual representation adopted for displaying the selected symbol.


66

5.7 Representing Sets of Numbers

The representation of sets of numbers as intervals is frequently used in algebra. For

example (a, b) = {x | a < x < 6} and as [a, 6] = {x | a < x < b}. In this form of

representing numbers, the delim iters do not always match. We illustrate this bv the

two expressions th a t follow.

[a, b) = {x | a < x < b}

(a, 6] = {a; | a < x < b}

9d 1 intervaLvar left_delimiter values right_delimiter2 values left_value , right.value3 left_value ID EN TIFIE R4 right.value ID EN TIFIE R5 left.delim iter [6 left_delimiter (7 right_delimiter ]8 right_delimiter )

Table 5.14: CFG fragm ent for intervals.

A possible gram m ar fragment for the representation of the four types of intervals is

given in Table 5.14. A lthough the gram m ar presented in Table 5.14 can be used to

represent num ber intervals, it is not useful for capturing the semantics of the concepts

involved. The fact th a t the nonterm inals left .del imiter and righLdelimiter are not uniquely defined suggests th a t production rule 1 reduces to four different sentences.

A nother problem with this definition is th a t it requires a parser with lookahead greater than three.

The gram m ar fragment shown in Table 5.15 captures the semantics of the structure.

This gram m ar is designed in a way th a t each pair of interval delim iter is uniquely represented by a production rule.


6 7

9 e interval openJntervalsinterval closedJntervalsopenJntervals open_part RIGHT_OPEN_PARopen in te rv a ls open .part RIGHT_CLOSED_DELclosedJntervals closed.part R IG H T.CLO SED .D ELclosedJntervals closed.part R IG H T.CLO SED .D ELopen .p a rt L E FT .O PE N J A R bodyclosed.part L E FT .C L O SE D J9E L bodybody left.value COMMA right.valueleft.value ID EN TIFIE Rright_value ID EN TIFIE RCOMMARIGHT_CLOSED_PAR )RIGHT_CLOSED_DEL ]LEFT_OPEN_PAR (LEFT_OPEN_DEL [

Table 5.15: CFG fragment to capture the semantics of intervals.

5.8 Representing Sums

The concept of sum m ation is discussed in this section. Both am biguity and extensibility problems associated with this operation are illustrated by exam ining its sem antic

characteristics.

Consider the sum represented by the expression below.

21 = £ i (5.9)t=i

E quation (5.9) illustrates the am biguity involved in the use of the = symbol, where

its m eaning can either represent the s tart of a sequence of attributions to variable i, or

the equality between two quantities. A lthough the syntax most commonly associated

with the sum of a sequence of items includes the = symbol as a way of expressing the iteration process, the semantics of the sum m ation construct does not require the

equality operator. The concept of sum m ation may be described by an operation on

an expression th a t is evaluated according to a sequence of predefined items. One


6 8

possible form of expressing this is by means of the syntax

Sum{rangeJ is t ' . express ion} (5.10)

which captures the meaning of the sum m ation concept. In Equation (5.10) S u m is a

prefixed binary operator and rangeJist defines the sequence over which expression is

to be com puted. The operand rangeJist captures the meaning of the iteration part of the sum m ation construct.

The fact th a t it is possible to a ttach a particu lar in terpretation to the form in which

rangeJist is syntactically represented is a problem to be considered whenever rendering is to take place. It is intuitive to associate the = symbol with the iterative

component of the sum construct, as illustrated in Equation 5.9. However the idea of

range is more meaningful when this expression is represented as

2 1 = £ i (5.11)!<t<6

9i 1 identity _expr sam ple.expr = sample_expr2 sample_expr expr3 sample_expr sum4 sum SUM { rangeJist : sample_expr }5 rangeJist s ta rt , end6 s ta rt identifier = expr7 identifier ID

Table 5.16: G ram m ar for sum m ation.

The gram m ar fragment illustrated in Table 5.16 allows sum m ation constructs, such

as the one in Equation (5.9) to be described by the syntax th a t follows.

S u m { i = 1,6; i} (5.12)

There are situations where more complex iteration control is required and sometimes

the necessary sum m ation condition is expressed as com pound statem ents. The ex-


6 9

pression below illustrates this fact.

m + n n

* = Yi H i + j (5-13)i—m /2 j = 0

i+ j= n

In Equation (5.13) the inner sum m ation includes a compound statem ent. The itera

tion mechanism is extended to support the composed condition which makes use of

a syntactically hidden conjunction to define the lower lim it for the iteration. The

m eaning associated with the = symbol, in its two occurrences on the conjuncted con

dition, is not the same. An am biguity was introduced by the additional semantics

attached to the = symbol as the result of an extension procedure.

9i 1 identity_expr sample_expr EQ sample_expr2 sample_expr expr3 sample_expr sum4 sum SUM { rangeJist : sam ple.expr }5 rangeJist s ta rt , end5.1 s ta rt single_start5.2 s ta rt com pound_start6 single_start identifier = expr6.1 com pound_start single_start ' identity_expr7 identifier ID

Table 5.17: G ram m ar for sum m ation.

A possible gram m ar definition to support the conjuncted condition can be obtained

by modifying the fragment associated with the definition of the sum m ation operation,

and by the addition of a gram m ar fragm ent to represent the composed version of the

iteration. Table 5.17 illustrates a possible set of gram m ars th a t can be used to capture the sum m ation operations th a t have compound iteration statem ents.

Since the gram m ar proposed in Table 5.17 has been developed with the purpose of

extending the recognition power provided by the gram m ar fragment in Table 5.16,

it is expected th a t some common structu ra l knowledge is shared between the two. This is true since rules 1 to 5 and 7 are the same in both fragments. Also rule 6 is

semantically equivalent in the two definitions. The two instances of this rule differ only on the ir left hand side nonterm inals.


70

The gram m ars proposed for capturing the semantics of the sum m ation concept illus

tra te the need for a composition process to support the extension of already defined

constructs by reusing existing gram m ar fragments. C hapter 6 discusses the gram m ar

extension problem and provides a solution in term s of gram m ar operations.

5.9 Conclusion

This chapter introduced the notion of using CFGs as the m ajor formalism to capture

the semantics of m athem atical concepts. It discussed the advantages and lim itations

of using CFGs to support the dynam ics of authoring m athem atics.

The syntax of program m ing languages is usually specified by means of CFGs [95]. S tructuring the m athem atical notation as a program m ing language has the advantage

of using CFGs for its specification and processing. Specification is supported by

the C FG 's structuring m ethods which include composition, choice, repetition, and

recursion [95]. Effective and efficient parsing algorithm s and tools are available to

support its processing.

A lthough CFGs have successfully been used for the specification of the syntax of pro

gram m ing languages, this formalism is not adequate for the definition of the semantics

of program m ing languages [100]. A nother im portan t lim itation this formalism has re

lates to its s ta tic characteristic. This restricts its use to the support of organizations

th a t do not depend on the notion of update.


71

Chapter 6

M odelling Context Dependent Information

The notion of using CFGs to support the semantics capturing of m athem atical con

cepts was introduced in C hapter 5. This chapter proposes the fundam entals of a docum ent organization th a t models the dynam ics of authoring m athem atics. The

model supports both the extensibility and am biguity characteristics of m athem atical

notation and is capable of capturing the m eaning of m athem atical concepts bv means

of syntax defined during authoring.

6.1 Authoring M athematics and Multimodality

This section presents the basic com ponents of a structure to support the dynam ic

authoring process as discussed in Section 5.3. As emphasized there, different inter

p retations may be assigned to a given syntax. This behavior is understood as insta

bilities in the binding between m eaning and syntactical representation. As a m ajor

characteristic of authoring m athem atics this needs to be addressed in any proposal

to model this type of authoring.

The conventional notation which is used for the communication of m athem atics is characterized by a context-dependent m eaning-to-syntax binding. This dynam ical

form of attaching m eaning to syntax is the mechanism available to the au thor for ex

pressing knowledge by means of the symbol arrangem ent he/she believes is the most


72

adequate for the com munication of the ideas to be presented through the docum ent.

One characteristic this notation has is to leave unspecified the domain which the con

cepts represented belong to. A lthough this simplifies the notation used it imposes

lim itations on the rendering of concepts for communication in different m odalities.

The context-dependent quality also requires the au tho r/u ser to have knowledge of the

context where the appropriate m eaning is to be associated to a syntactical represen

ta tion of a concept. One way in which this lim itation may be approached is to assign

this knowledge requirement to the structu re th a t is used for capturing the semantics of the constructs.

The organization proposed here addresses the dynam ic m apping between meaning

and representation by means of a m eta-system . This structure establishes the nec

essary m ethods for capturing the semantics of m athem atical concepts, leaving the

definition of the desired notation to the author. W hen new concepts or extensions

to constructs already defined are necessary, the au th o r’s involvement will be required

to configure the system for capturing the constructs th a t need to be included in the

docum ent. The dynam ic authoring process may be viewed as a m odular organization

composed of a Param eterized N otational S tructure (PNS), a Hierarchical Interm edi

ate R epresentation (HIR) and a Rendering S tructure (RS).

A param eterized notational s tructu re is an organization defined by a m eta-structure, a program m ing language and a set of gram m ars. This set contains the necessary

gram m ars to capture the syntax and semantics of the m athem atical concepts th a t

have been included in a given docum ent. The m eta-structure provides the rules to

be used for the creation of the gram m ars th a t belong to the set. The program m ing

language m anipulates the gram m ars th a t have been created according to the m eta-

structure. The notion of scope is also provided by this language which is applied

to resolve syntax ambiguities. S tatem ents of this language include the m athem atical

constructs th a t have been encoded according to a domain defined by a scope. In

sum m ary this language provides a mechanism to aid the authoring of m athem atical

concepts th a t are being captured by the gram m ars from the set. It also provides, by

means of the scope, a dynam ical form to cope with syntactical ambiguities.

Interm ediate representations of docum ents are generated as a result of the interaction between the au thor and the PNS. These hierarchical interm ediate representations

support the provision of the inform ation th a t the rendering structure will m anipulate in order to generate different views of a docum ent. The set of docum ent views


73

produced by the RS will depend on the purpose of the application. For this reason it

processes the HIR based on knowledge provided by application experts.

PNS RSHIR

Figure 6.1: S tructure to support dynam ic authoring and m ultim odality processing

The interaction between the three modules is illustrated in Figure 6.1. The ar

row /function pair is used to represent how inform ation is processed. The in terpreta

tion associated w ith this form of representation is described as follows.

Function / ( ) represents the service provided by PNS to its only client HIR, which

involves the creation of an interm ediate docum ent representation. The set of functions

h i ( ) , . . . , hk() is used to represent the set of services tha t RS provides. These services

are based on the knowledge stored in HIR th a t are shared with RS through g(). They

are mechanisms to produce different views of the encoded docum ent obtained from

HIR. The views, represented bv the boxes labeled vk in Figure 6.1, are the

result of the application of the rendering form ats required by the final application. The diagram shown in Figure 6.2 describes the dynam ics of the model proposed.

The following discussion makes use of the inform ation provided in Table 6.1 and the diagram shown in Figure 6.2 to present the operational organization of the complete

process. According to this diagram every interm ediate docum ent i is the result of

intentions of the au thor a, coded as language / statem ents, th a t will m anipulate the concepts defined in the set of gram m ars g. As illustrated in Figure 6.2, knowledge

of the m eta-structure m is necessary to update the set of gram m ars. This may be done by means of the program m ing language /, which updates the gram m ar set based


74

Modules Components Documents Processing Entitya authors application specialist

PNS 1 program m ing languagem m eta-languageg set of gram m ars

HIR i interm ediate docum ent representationRS P application specific semantics

r rendering mechanismd docum ent applicatione editing

Table 6.1: Com ponents involved in dynam ic authoring for m ultim odality.

Figure 6.2: A sketch of the dynam ics of the au thoring/rendering process.

on previously defined gram m ars. A nother possible way to update these gram m ars

is by the direct use of a text editor. E diting mechanisms used for this purpose are

represented by e in Figure 6.2. This way of m anipulating g is needed whenever the

capturing of the meaning of a concept cannot be obtained by the use of /. The

provision of all gram m ars necessary for supporting the dynam ic authoring process is

therefore the result of actions taken by the au thor th a t involve the structure defined by the PNS. The various docum ent applications d may be obtained from the interm ediate representation i , by the application specialist s. For each application the knowledge

of specific semantics p as well as rendering mechanisms r are necessary.


75

Usually the au thor is interested only in providing the semantics for the assumed

standard docum ent’s usage which, most of the time, involves only a printed view of

the docum ent. The semantics for other usages such as voice, for example, could be

obtained with the support of an application specialist.

Electronic docum ents allow for the possibility of rendering the abstract concepts which

compose the docum ent's logical structu re as different concrete variables. A nother

characteristic of electronic docum ents is th a t the meanings of the included concepts

need to be properly encoded to allow their processing by the com puter. It is also

difficult or sometimes even impossible to predict all potential applications th a t may be

assigned to the abstract concepts th a t comprise a document. All these characteristics

may be supported by the availability of:

1. an adequate semantics-based encoding of the m athem atical concepts, and

2. a set of associated rendering mechanisms to convert the encoded concepts to

the respective expected formats.

The encoded concepts are represented in Figure 6.2 by the circle named i and the

rendering mechanism bv the circle nam ed r. As illustrated the interm ediate document

representation is the only component th a t is visible to the rendering mechanism. For

this reason all am biguities m ust be resolved at the PNS during authoring.

The organization proposed supports the m ultim odal communication of concepts by

modeling the hum an behavior involved during the authoring activity. Since this thesis

aims at capturing the semantics of m athem atical concepts it only considers the PNS

portion of the organization presented. Sections 6.3 to 6.8 describe the fundam ental

com ponents of the PNS module. Sections 6.3, 6.4, 6.5 and 6.7 discuss the gram m ar

com ponent. Section 6.6 introduces the language and Section 6.8 the m eta-language

components, respectively. A gram m ar-based structure to model the dynam ics of

authoring m athem atics is discussed in the rem ainder of this chapter.

6.2 A Formal Structure for Document Authoring

In this section it is assumed th a t docum ents are created according to the authoring model presented in Section 6.1. The model introduced there establishes a set of steps


76

which can be followed during the creation of a single docum ent. This idea is now

extended by the notion th a t a docum ent may be considered as the result of a set of

modifications applied to other docum ents. This property as well as an exception to

this notion are discussed next.

The proposed docum ent structu re is based on the assum ption th a t any docum ent in its final version is seen as the result of a com position process in which interm ediate

versions of the docum ent are produced. As the au tho r’s ideas evolve and new concepts need to be included, different versions of the docum ent are generated. These versions

can be interpreted as blue prints of the au th o r’s capacity to com m unicate ideas and concepts.

Three im portan t stages related to the versions of a docum ent produced during the

authoring process are identified here. The first is the one in which the au thor makes

use of any available concept definitions. Docum ents created during this stage are

called default docum ents. A nother stage is the final. At this stage the authoring

process is over and the outcome is the final docum ent. In general, many different

versions of a docum ent are created before the final one is produced. This leads to the th ird stage, the interm ediate one, where all intermediate versions of a docum ent are

created. In a case where only one docum ent version is produced during the complete

authoring process, the default, final and interm ediate versions are the same.

At any instan t during the authoring process, the structure required to support the

creation of a particu lar version of a docum ent is the result of a process involving a set of gram m ars. Each isolated gram m ar contributes to the capture and representation

of a t least one m athem atical concept and has been included in the docum ent’s sup

porting structu re by means of one of the following three approaches. Each gram m ar

fragment either

1. has been created by standard editing procedures or

2. has already been defined or

3. has resulted from gram m ar operations.

The structu re proposed in this chapter organizes gram m ars into directories, and the com position of a directory includes definitions which have been created by any of


7 7

the three above mentioned approaches1. I t is assumed th a t an au thor who uses the

model proposed here will often have a set of concept definitions available which may be used during the creation of the default docum ent. The definitions should have

their location included as part of the referencing process. These locations may vary

from a local file system to the World W ide Web.

As proposed in Section 4.6, authoring environm ents are represented in term s of a

docum ent structu re and the system 's interface. The expression V = (S , I) wras used

for this purpose. In this section the docum ent structu re is defined as an organization

to support the dynam ic characteristic of docum ent authoring. This characteristic is

supported here by means of an adaptab le organization. For this reason, the docum ent

s tructu re 5 will be called the document instance structure. The following subsection

introduces a gram m ar-based structu re to model the dynamics of authoring m athematics.

6.2.1 Grammars and Dynamic Document Authoring

A document instance structure Sj, for j > 0, is a tuple

S j = (Dj ,c) (6 .1)

where c is a binding control mechanism and Dj is the semantic structure. The sub

scripted variable j determines the version of the docum ent structure considered. The

binding control c is a gram m ar the purpose of which is the provision of an environ

ment in which the sem antic structure required for the docum ent instance structure

is placed. This environment provides support for docum ent authoring behavior. For

this reason it m ust be independent of versions.

The sem antic structure Dj is a finite sequence,

= (6 .2 )

of finite sets of gram m ars.

A domain2 is a gram m ar in Gj, 1 < i < nj, such th a t it contains both the syntax and

1 Refer to Figures 6.1 and 6.2 and Table 6.1 for an overall view of the authoring mechanism.2 As discussed in Chapter 5, a CFG is considered equivalent to a data type; therefore domain and

data type are also equivalent.


78

portions of the semantics needed to capture the meaning of a set of m athem atical

concepts. The set G\ is called domain directory or ju st directory.

For a given version of a docum ent with docum ent instance structu re as defined in

Expression 6.1, the collection of all gram m ars which can be found in th a t version of

the docum ent is called the document dictionary. This collection is defined as

« i = U c ; (6-3)t=l

Each Gj E D j is a union of three sets of types of gram m ars th a t is

G{ = N \ U F( U C l (6.4)

The gram m ars in N i are gram m ars created by standard editing mechanisms. The

gram m ars in F- represent gram m ars th a t have already been created. They are ready

to be used and satisfy the following condition:

F?= 0 if j = 0, i = 1

c u u u #* i f z> i (6'5)* = 1 k= 1

The th ird set, C l collects all gram m ars th a t are introduced by the two binary operations in the set B — {%, o}3. The set C{ is defined as follows:

C{ = {h P h ' | h, t i € {F> U N i ) A P G B } (6.6)

C hapter 7 presents a set of examples involving the organization introduced in this

section. The reason for deferring the examples is because the formalisms needed

to support the concepts introduced are provided throughout the rem ainder of this

chapter. A norm al form for CFGs is proposed in the following section.

3 The definition of these two operations require that their operands be grammars that satisfy the extension normal form criteria introduced in Section 6.3. Both the syntax and the semantics of the two operations will be defined in Section 6.4.


79

6.3 Structuring with Grammars

The design strategy introduced in Section 5.3 suggests the use of CFG fragments to capture the semantics of the m athem atical concepts. The m odular struct ure proposed

there is based on the assignment of a unique nonterm inal to represent the meaning

of a concept. The set of productions used for the definition of a nonterm inal, in this

scheme, is viewed as the specification of a d a ta type which is represented by the gram

m ar’s s ta rt symbol4. The m athem atical constructs recognized by this organization

are, a t run-tim e, the instances of the associated types, as defined by the gram m ar5.

For this reason they are considered as objects.

As mentioned in Section 5.3 the use of norm al forms would benefit both the semantics

capturing and the gram m ar com position activities. The connection between semantics

capturing and norm al form is approached here by the definition of a set of tem plates.

These tem plates establish the restrictions which the gram m ar rules for semantics

capturing must follow.

The conventional m athem atical notation represents concepts as strings of symbols.

The in terpretation of any of these strings is based on the arrangem ent of the symbols

and the dom ain (field) in which definitions are proposed. A lthough the num ber of

operands th a t can be attached to an operator is determ ined by the concept to be represented, the location where an operator is placed inside expressions is usually lim ited to three possibilities. O perators may usually be arranged according to either

infix, prefix or postfix form ats depending on their placement relative to their operands

in the expression6. They are, therefore, considered infixed if they have both left and

right operands, prefixed in case only right operands are provided and postfixed in

situations where operands are only placed on the left side of the operator. Variations

of this scheme are necessary to support situations where the object has no operands.

A vector, for instance, illustrates this scenario since, when represented according to

4It is assumed that mathematical concepts only have the expected meaning if the domain directory in which they are defined is considered.

5 Although objects in programming languages are understood as the result of class instantiations, the interpretation attached here to them differs by the fact that the instances are not automatically generated by the language processor. In the proposed model grammars correspond to classes and the expressions are the objects defined during authoring. In this scenario objects are the result of an incremental process during which portions of the object or the complete object are provided by the author.

6 An exception to this rule is the representation of juxtaposed multiplication as used in polynomials where no explicit operator is provided.


8 0

the s tandard notation, it is often encoded as a single lowercase le tter in bold face

type. The representation of this concept may, for instance, be viewed as either a

prefixed or a postfixed expression with no operands.

A normal form for CFGs is, therefore, proposed as a way of structuring gram m ars

to support the expression form ats discussed. The term inal symbols of the proposed

structu re are used for the representation of the operator's nam e and the nonterm i

nals are used for the representation of the operands and necessary delimiters. This

gram m ar structure also provides the necessary mechanism to support recursive def

initions since they are needed to capture the repetitive occurrences of certain types

of operators in expressions.

D e f in it io n 2 A CFG G = ( N , T , P, S) is said to be in the Extension Normal Form

(ENF) if, for all A € N , w ith a G T and a € N *, there are only four kinds of productions7 in P. They are:

(1) A —> aa

(2) A -> aa

(3) A -> A aA

(4) A -> A

T h e o re m 1 For every CFG G such that e ^ L(G) , one can construct an equivalent

CFG in Extension Normal Form.

Proof: This result follows from the super-norm al-form theorem in [71, 106].

Each production rule of the ENF may be interpreted as an atom ic gram m ar fragment.

To achieve this assume each one of the four kinds of rules, as proposed by the ENF, defines a CFG.

In Section 5.3 the correspondence between CFG and type was proposed. This indi

cates th a t the definition of a type will be a function of the num ber and the structure

defined by the gram m ar’s rules. For any given CFG rule, the com bination of te r

minals and nonterm inals determ ines the type of the rule. Rules may therefore be organized according to the num ber of term inals and nonterm inals as structured and

7G being in ENF means, G is an interpretation of a 2-symbol CFG form [106] with rules only of the types listed.


81

non-structured. N on-structured rules define prim itive types such as integer, real and

character for example. Rules th a t cannot be associated w ith any type are also con

sidered non-structured. S tructured rules define types and the m eaning of a type may

depend on inform ation provided by other rules.

The following definitions impose restrictions on gram m ars in ENF as a way to clas

sify these gram m ars according to the criteria of being structured or non-structured. The resulting gram m ar fragments are the building blocks which will be used for the

semantics capturing process.

Definition 3 A CFG G = ( N , T , P, S) in ENF is called an operatorless gram m ar if N = {S', £?}, T = {} and P = { S —» B } . The rule S —i B is called an operatorless8

production.

O peratorless gram m ars are used to introduce specializations. T hat is, a concept asso

ciated with S is specialized to B. Any instan tia tion of B is therefore an instantiation of S.

D e f in it io n 4 A CFG G = ( N , T , P, S) in EN F is called a primit ive gram m ar if

N = {S}, T = {a} and P = { S a}. The rule S —> a is called a prim itive

production.

Prim itive gram m ars introduce atom ic types. T hat is, the type assigned to its nonterminal does not depend on the type associated with any other nonterm inal.

D e f in it io n 5 A CFG G = ( N , T , P, S) in EN F is called a basic gram m ar if for

a £ ( N U T ) + its set of rules is P — {5 —> o}. The rule S —> a is called a basic

production and it is neither a prim itive production nor an operatorless production.

Basic gram m ars are type constructors. They are used to create com posite types. In

this case the type assigned to its s ta rt symbol will depend on the types associated

with the other nonterm inals th a t are part of the rule.

O peratorless, prim itive and basic gram m ars are the essential com ponents which will

be involved in the semantics capturing activity. For this reason they will be referred to as fundamental gramm ars.

8This type of production is often called unit production [55, 107, 69].


8 2

Definition 6 All CFGs in ENF which are neither operatorless, prim itive nor basic are called derived gram m ars. Derived gram m ars which have no operatorless productions

are called reduced gramm ars.

The following example illustrates the notion of basic gram m ar. Consider, for instance,

gram m ars

. Gi = ( N u T u P ^ S t ) w ith N , = { S U B , C } , 7 \ = {a}, P l = { S l a B C }

• G-i = (Ari,T i, P 2, S i) with P 2 = {*51 —> B C a } and

• ^ 3 = (-^i) T\i P 31 S i) with P 3 = {5i a B C a }

G 1 and G2 are both basic gram m ars. G 3 is not basic since it is not in ENF.

6.3.1 M athematical Concepts and Grammatical Dependencies

The fact th a t the definition of a m athem atical concept usually depends on other

concepts indicates the existence of relationships am ong them . This characteristic is,

in this thesis, interpreted as a dependency relation where one concept is the dependent

and a set of others the determ inants. In this work this relation is represented by an

arrow th a t s ta rts a t the set of determ inants and points to the dependent. The two

following examples illustrate this notion of dependency.

The concept of absolute value is defined as an operator th a t returns the unsigned version of the expression supplied as its argum ent. According to conventional no ta

tion, | s inx | is the encoding for the absolute value of the sine function computed for

argument x. Assuming A and S represent the absolute value and the sine function

respectively, the relation between these two concepts, in the context of |s in x |, is

indicated as follows:

A <$= S

The presented dependency relation establishes a hierarchical relationship th a t has A

as the parent of its only child, the S construct. In this case S is the determ inant and A the dependent.


8 3

The concatenation of strings of characters is often encoded as expressions in the infix

form at with the + symbol representing the operation on the operands [107]. An alter

native encoding is also used when a single string of characters is to be concatenated with itself. In this case the operation may also be encoded as a power expression

where the expected result is the original string concatenated with itself the num ber of times indicated by the exponent. The two encodings are illustrated by the equality expression

a * 3 + b = aaab

where the symbols *, + and = represent the power, concatenation and equality operations respectively.

ca tE qual i t y expr expr EQUALS term .catexpr te rm .catterm_cat te rm .cat CONCATENATION termterm .cat termterm term PO W ER factorterm term _stringterm .string STRINGfactor IN TEG ER

Table 6.2: CFG for equality of strings of characters.

The operations and operands involved in this expression are described in the gram m ar

represented by the set of production rules defined in Table 6.2. The hierarchy imposed

by the rules of gram m ar catEquality9 establishes the following seven dependency relations:

EQUALSEQUALS

EQUALS

CONCATENATION

CONCATENATION

PO W ERPO W ER

CONCATENATIONPO W ER

STRING

STRINGPO W ER

STRINGIN TEG ER

9It has been assumed that STRING is defined by the regular expression [a-j]" and INTEGER is a nonzero positive integer.


8 4

The above relations determ ine the dependencies which exist am ong the m athem atical concepts which have been used for the definition of another concept. In this thesis they

will be called terminal dependencies10 because of the one-to-one association between the nam e of the m athem atical concept and the term inal symbol wrhich represents it

in the gram m ar which captures the m eaning of the concept.

scheme -> ID { dexpr }dexpr dexpr restdexpr —> IDrest —> <= detdet —> IDdet ( dlist ) moredet -> dexprdlist -> first othersfirst -> IDothers -> , objectobject -> IDobject —> dlistmore emore —> ; dexpr

Table 6.3: CFG for representation of schemes.

The gram m ar in Table 6.3 provides the syntax which will be followed, in this thesis,

to represent term inal dependency relations. Each word belonging to this gram m ar is

called a representation scheme.

Since representation schemes are always related to gram m ars, they will be identified

by the gram m ar's nam e appended with the literal string Scheme. The expression

which follows determ ines th a t catEquality Scheme is the representation scheme for the gram m ar defined in Table 6.2.

cat .E qual i t ySchem e{E quals <= (Concatenat ion , P o w e r , S t r i n g );

Conca tenat ion (S t r i n g , Power):

P o w e r <= (Integer, S t r in g ) }

10Although the formal definition of terminal dependency is provided at the end of this section, these dependencies can easily be identified whenever the related grammar is expressed as a reduced grammar.


8 5

Although the ENF determ ines the possible arrangem ent of nonterm inals and term i

nals in production rules, this mechanism is not adequate for the description of the

relationships which exist am ong the concepts represented. This lim itation is a con

sequence of the fact th a t in a CFG the nonterm inals are variables and the term inal

symbols are constants. This implies th a t the existing relationships among concepts can only be expressed in term s of the ir associated nonterm inals when represented by

CFGs. The notion of gram m atical dependency is introduced as a form of describing

the restrictions a set of gram m ars should satisfy whenever their term inal symbols

express a dependency relationship.

Prior to the definition of term inal dependency the notion of type decom position 1 1

needs to be introduced. This means a CFG is decomposed as a set of gram m ar

fragments which can be tested for nonterm inal relationships. The existence of rela

tionships am ong nonterm inals in different gram m ar fragments leads to the notion of

gram m ar dependency. These ideas are formally presented by the following definitions.

Definition 7 Let L = ( N ,T , P, S) be a reduced gram m ar. The type decomposition

of L is the set

Z t = { K p = (N p, Tp, Pp, Sp) | Pp = {p}, Np = (Lp U Rp), Tp = Op, Sp = Lp, p e P }

D efinition 8 Let Gb = (Nb, Tf,, Pb, Si,) be a basic gram m ar and let G 0 = (AT0, T0, P0, S0)

be either a basic or a prim itive gram m ar such th a t Gb ^ G a. Gb is gram m atically

dependent on G 0, G b <= G 0, if L 0 n R b ^ 0.

Ei expr expr MINUS termEi expr term TIM ES factore 3 expr ID EN TIFIERE 4 term term DIVIDEDBY factorEs term ID EN TIFIE REf, factor ID EN TIFIE R

Table 6.4: G ram m ar fragments illustrating gram m ar dependencies.

11 Since the start symbol of a grammar is interpreted as a type, the decomposition of a grammar as a set of grammar fragments is viewed here as a type decomposition.


8 6

To illustrate the gram m ar dependency concept, consider the gram m ars defined in

Table 6.4 th a t may be viewed as the result of the type decomposition of some gram

mar. For this exam ple Z = [ E i , E 2, E 3, E 4, E 5, E 6}. The following dependencies are

obtained by applying Definition 8 to the gram m ars defined by the productions in Table 6.4.

Ei <t= (E 2 , E 3 , E 4 , E h) (6.7)

E 2 <= (E\ , E 3, E 4, E 3, Ee) (6.8)

E 4 <= (E 5 , E 6) (6.9)

As can be seen, commas and both opening and closing parentheses have been used

for the representation of dependencies. These symbols wrere included in order to

group all determ inants th a t are associated w ith a dependent in a single dependency

expression. The gram m atical dependencies determ ined by the dependency relation

(6.7), for instance, states th a t portions of both the syntax and the semantics described

by gram m ar E x are supplied by gram m ars E 2, E 3, E 4 and E 5.

W hen capturing the semantics of m athem atical concepts by a set of gram m ar frag

ments, the names of the concepts are represented bv the term inals of the gram m ars involved. The resulting gram m atical dependencies which exist am ong gram m ars in

this set can be expressed by means of the term inal symbols of these gram m ars. As

stated before, these relationships are called term inal dependencies and they are formally defined in term s of gram m atical dependencies as follows.

Definition 9 Let G b = (N b, Tb, Pb, S b) and G 0 = (N 0 , T0 , P 0, S0) be CFGs. W hen the

gram m atical dependency G b <= G a is satisfied we also say th a t there is a term inal

dependency for each pair of term inals x, y such th a t x G Tb and y £ T0. The syntax

x <= y is used to express a term inal dependency between term inals x and y. For this case term inal x is called the dependent and term inal y the determ inant.

The collection of all term inal dependencies which can be determ ined from a type

decom position is called a dependency scheme. A representation scheme is the syntac

tical structure which is used to list all term inal dependencies found in a dependency scheme.

Dependency schemes can be used as an aid to help with the identification of redun

dancies by syntax equivalence. Reduced gram m ars which do not share dependency


8 7

schemes are free of redundancies by syntax equivalence.

6.4 Grammar Operations and Extensibility

This section introduces two operations. These operations have gram m ars as both their

input param eters as well as their returned inform ation. Input gram m ars are seen as

providers of both syntax and semantics and they are never modified by the operations.

The ou tpu t produced is the result of the com bination of the production rules supplied

by the input. Since the application of any of the two proposed operations produces

a single gram m ar, the creation of more complex gram m ar definitions may be seen as

the result of a sequence of operations which would use the inform ation obtained from previous operations. Therefore the creation of the final gram m ar may be viewed as

the result of a process where gram m ar fragments have been inserted a n d /o r deleted.

Both operations are defined for input gram m ars in ENF. This requirem ent guarantees

th a t the ou tpu t gram m ar is also in ENF. In this thesis these operations are the

means by which gram m ars are combined in order to support the extensibility of the

m athem atical notation.

The use of CFGs as a supporting organization to capture the m eaning of m athem atical

concepts, as previously proposed in this work, is restricted to docum ent structures

which can only be modified by editing mechanisms. This lim itation was discussed in Section 5.3 where the correspondence between CFG and type was presented.

The need to either overload a given symbol by attach ing a different meaning to it,

or to introduce a new syntactical representation for a m athem atical concept may be

viewed as modifications to be executed on gram m ars which have already been defined.

A nother approach to this need is to generate gram m ars to support the mentioned

requirem ents by reusing, whenever possible, the available gram m ars. The notion of

gram m ar reuse as defined by the two operations proposed here is considered one of

the fundam ental m echanisms 1 2 which this thesis introduces to approach the semantics

capturing necessity. For this reason both operations do not modify gram m ars which

have already been created. Instead they support the semantics capturing activity by

12Another important mechanism is the notion of context switching or scope. This notion is introduced in this chapter to support symbol overloading.


88

allowing gram m ars to be created by reusing inform ation provided by other gram m ars.

The following definitions introduce these operations:

D e f in it io n 10 Let G b = ( N b,T b, Pb, S b) and G 0 = (N 0 , T o, P0, S0) be two CFGs in

ENF. The composition operation G b o G 0 will produce a CFG G c = (Nc, Tc, Pc. S c) as follows:

Pc = Pb U P0

N c = N b u N 0

Tc = Tb U T0

Sc = Sb

D efinition 11 Let G b = ( N b,T b, P b, S b) and G 0 = {N 0 , T 0, P0, S0) be two CFGs in

ENF. The extension operation G b % G 0 will produce a CFG G x = (Nx, Tx, Px, Sx), as

follows:

Px = { 4 —y (a | A —y o G Pb A A $ N 0} U P0

N x = N b U N 0

Tx = Tb U T0

Sx = S b

W hile the com position operation is left-associative and com m utative the extension

operation is left-associative, but not com m utative. To illustrate the use of both

the com position and extension operators, consider the need to capture the m eaning

of expressions consisting of the addition of numbers. For this purpose assume th a t

gram m ar fragments G 2, G 4, Ge and Gg are available. This means these fragments have already been created by editing procedures and have been stored in some com puter- based device.

G 2 expr expr PLUS term

Table 6.5: Basic gram m ar for addition.

Table 6.5 displays the basic gram m ar G 2 which captures the semantics of expressions consisting of two operands connected by the infixed PLUS operator. Table 6 . 6 and


8 9

expr term

Table 6 .6 : O peratorless gram m ar linking expr and term nonterm inals.

term : num

Table 6.7: O peratorless gram m ar linking term and num nonterm inals.

Table 6.7 contain the definitions for operatorless gram m ars G 4 and G 6 respectively.

Table 6 . 8 displays the prim itive gram m ar G 8. The gram m ar to capture the semantics

of addition expressions involving numbers may be obtained by the com bination of

these four gram m ar fragments as provided by the expression

{{G 2 0 G^ } £ ? 2 4 o Gg o G 8 }Gr i .

The notation { G 2 o G i } G 2 4 is used to express the fact th a t the result of the compo

sition operation G 2 0 G 4 has the nam e G 2 4 . In a similar way G ri is the nam e it has

been assigned to the com position operation G 2 4 o Gg 0 G 8. The derived gram m ars G 2 4

and Gr, are displayed in Table 6.9 and Table 6.10 respectively.

Consider now capturing the m eaning of expressions involving both the addition and

the m ultiplication of numbers. One way to approach this problem is to make use of the gram m ars which have already been defined. Additional gram m ars necessary

to capture the concepts not covered by these gram m ars may be obtained, say, for

example, by editing.

Assume gram m ar Gn is already available. Also assume gram m ars G j, G 3 and G 5 have

been created by editing. Table 6.11 displays the basic gram m ar G i. The operatorless

gram m ars G 3 and G 5 are displayed in Table 6.12 and Table 6.13 respectively. The

gram m ar to capture the semantics of expressions involving the m ultiplication and addition of numbers is therefore obtained by means of the expression

{G>] % { G \ o G 3 }G i3 o G 5 }Gr2.

The notation Gn % { G 1 o G 3 }G i3 is used to indicate th a t G ri is extended by G i 3 which

is the result of the com position operation Gi o G 3. The name G T2 is therefore assigned


9 0

num : NUM BER

Table 6 .8 : Prim itive gram m ar setting nonterm inal num to term inal NUM BER

G 2 4 expr expr PLUS termexpr term

Table 6.9: Derived gram m ar for addition.

to the result of the operation {Gri % G i 3 o G 5}. The derived gram m ars G i 3 and G r2

are displayed in Table 6.14 and Table 6.15 respectively. A simple gram m ar fragment

to deal w ith the usage of both extension and composition operations is presented in

Table 6.16. According to this gram m ar the result of the binary operation(s) may

either be saved as a new gram m ar or not. This is a consequence of the fact tha t

the nonterm inal new_class may be replaced by the term inal ID EN TIFIE R or by the

em pty string e. Therefore whenever variable new_class is replaced by the em pty string

the result of the binary operation(s) will not be remembered. A lthough there is no

means of reusing the result produced, the procedure does generate a gram m ar. In

this thesis, this gram m ar is called an implied gram m ar.

The notion of implied gram m ar introduces the possibility of defining domains w ithout

adding gram m ars to the domain directory. These types of domains exist only during

run-tim e and are called implied domains.

6.5 Structuring with Domains and Directories

Section 6.4 presented a structured process to capture the m eaning of m athem ati

cal concepts. The approach introduced the notion of atom ic gram m ar fragments

and the notion of c rea te /u p d a te CFGs by means of two binary operations. Instead

of concentrating the needed knowledge in a m onolithic gram m atical organization,

this process d istributes the required inform ation among a set of gram m ar fragments.

These fragm ents are therefore viewed as decentralized structures which decompose m athem atical concepts according to CFGs which are either basic, prim itive or oper

atorless. The d istributed fragments may be combined by the binary operations as a


91

Gri expr expr PLUS termexpr termterm numnum NUM BER

Table 6.10: Resulting gram m ar for expressions involving addition.

G i term : term TIM ES factor

Table 6 .1 1 : Basic gram m ar for m ultiplication.

way of generating other gram m ar fragments. The set of gram m ar fragments are, in

this way, updated in an increm ental style by module reuse. As described so far, the

solution supports extensibility from a restricted point of view since it does not con

sider the m ulti-dom ain aspect of m athem atics. Instead it assumes th a t all concepts

to be represented belong to a single domain.

The proposed approach allows the possibility of considering gram m ar fragm ents as

both open and closed concepts. The fact th a t they may be used to represent unique

inform ation which may be stored as com ponents of a library and used by clients

of the library, characterizes them as closed concepts. On the other hand the same

fragments may contain inform ation which may be used for the creation of a new

gram m ar fragment by means of the two binary operations. For this reason they may

also be considered as open concepts. This in terpretation is consistent with the notion

of object-oriented class as provided by [72]. According to this in terpreta tion CFGs

correspond to classes. Therefore for a given CFG, say, for example G = (N , T , P, S ), the words in L(G) will inform ally 1 3 correspond to instances of the class associated

w ith G. M athem atical expressions will consequently be the objects. Operatorless

gram m ars and CFGs which have only operatorless productions are an exception to

this because they have no means to express any concrete objects, and therefore cannot generate m athem atical expressions.

13 This association is loose because some fundamental characteristics of classes cannot be expressed as grammar operations. Consider, for instance, the notion of subclass. This concept does not always correspond to grammars which result from either the extension or the composition operations.


9 2

G> term : factor

Table 6.12: Operatorless gram m ar linking term and factor nonterm inals.

G 5 factor : num

Table 6.13: Operatorless gram m ar linking factor and num nonterm inals.

6.5.1 Domains, Directories and Symbol Overloading

The approach presented in the previous sections does not properly address document organizations containing symbol arrangem ents which have been used to express con

cepts which belong to more than a single m athem atical field. In order to extend

the proposed process a relation between symbol overloading14 and dom ain/directory

needs to be established. The solution proposed in this subsection approaches symbol

overloading by means of a real-tim e u p d a te15 process. This process is the mechanism

by which the structu re of a docum ent adapts in order to cope with representation

am biguities introduced as the result of overloading.

For any given directory the solution determ ines th a t the overloading is resolved by

means of a dynam ic directory change. This implies th a t the m eaning of symbol ar

rangem ents is a function of the directory in which they are defined. The dynam ic

characteristic is required to support the possibility of user-defined syntax to be in

troduced during authoring. A directory therefore defines a scope and the symbol

overloading determ ines the need for a change of scope or context switch16. The ap

proach supports this requirement by introducing the notion th a t any twro semantically

distinct m athem atical concepts which have been assigned the same arrangem ent of

symbols for the ir syntactical representation are considered here to have an overloaded

14One relevant aspect of the dynamical authoring of mathematics is the fact that the overloading of symbols is at the author's discretion. This characteristic is nondeterministic. therefore it cannot be predicted.

15 In the context of this thesis, real-time update is used to refer to the document modifications done during the document authoring activity.

16It is important to note that, in this scenario, both the number and contents of domains are under complete control of the author of the document. This indicates that domain is a dynamic concept. This point of view has been formally stated in Section 6.2.


9 3

G 13 term term TIM ES factorterm factor

Table 6.14: Derived gram m ar for m ultiplication.

G T2 expr expr PLUS termexpr termterm term TIM ES factorterm factorfactor numnum NUM BER

Table 6.15: Resulting gram m ar for expressions involving addition and m ultiplication.

representation. Therefore the representation of a m athem atical concept is considered

non-overloaded if there is a one-to-one relationship between the concept and symbol

arrangem ent used in its syntactical definition. This idea is formally stated bv the following definitions:

Definition 12 Let S be an alphabet and C be a nonem pty finite set of m athem atical

concepts. The representation of a m athem atical concept is a m apping from C to S + .

Definition 13 The representation of a m athem atical concept is overloaded if the

m apping from C to S + is not injective.

By structuring concept definitions into domains and directories it becomes necessary

to establish the conditions under which existing gram m ars could be applied to the construction of a directory. To ensure th a t a directory is free of am biguities the

restriction proposed by the following definition needs to be observed.

Definition 14 A domain directory is overloaded if the representation of some m ath em atical concept in the directory is overloaded.

The notion of non-overloaded directories is useful in this context because each over

loaded concept representation which needs to be included in a docum ent determines


9 4

s tm tJ is t s tm tJ is t ; s tm t | stm tstm t { class } new.classclass —> class operator other.classclass —> other.classoperator % | oother_class stm t | ID EN TIFIERnew .class -> ID EN TIFIE R | e

Table 6.16: G ram m ar to support the use of both the composition and extension operators.

the need for a separate directory. This means the concept representation forces the

existence of an organization in which its m eaning is uniquely defined. This approach

has the advantage of considering the m ulti-directory characteristic as a supporting

mechanism for the solution of the symbol overloading necessity.

Semantics m odularity is achieved when the many-to-one m apping between concept

and representation is restricted to non-overloaded domain directories. The resulting

docum ent created, once the complete authoring process is over, will have its contents

naturally organized according to the meanings of the concepts involved.

6.6 Languages as Control Structures

The concept of non-overloaded representation establishes the necessity of a directory switch mechanism as a way to adap t to symbol overloading. This introduces addi

tional complexity to the strategy chosen for processing the inform ation provided by

directories. For this reason the complexity of language processors designed to address

this characteristic increases with the num ber of domains introduced.

As gram m ar fragments, directories are both open and closed concepts. They are open

because they are dynam ic and allow m odification17 to take place. As a closed concept

they represent unique inform ation. E ither as a physical library component or as the

result of operations performed on its underlying set of gram m ars, a directory exists

17No dynamic modification is ever allowed to a domain/directory as the result of the composition and/or extension operations. However both domains and directories may be modified by editing.


9 5

as a single CFG. Since a directory will u ltim ately be represented by its CFG, there exits a language processor18 associated with its gram m ar.

Although a directory which comprises part of the logical structu re of a docum ent need

not have any direct dependency with the others, they all share a common s tru c tu re19

where all m athem atical inform ation of the docum ent is presented. This requirement

establishes th a t a form of synchronization is necessary to guarantee th a t the next

piece of d a ta to be processed will be dealt with by its associated processor.

The arrangem ent by which the m athem atical concepts are organized throughout a

docum ent is a user-defined task which takes place during the authoring process. It is during this phase th a t the au thor specifies both the syntax as well as the true

meaning of operations by binding concepts to syntax and collecting them into related

domains and directories. The structu re of the docum ent, a t any tim e during this

process, will therefore reflect the way these directories are arranged. There are three

possible ways directories may be composed. A docum ent structu re is the result of

directories arranged in one of the following Directory Composit ion Forms :

• Pure linear,

• pure hierarchical or

• combined form, a com bination of linear and hierarchical.

In a pure linear organization, directories are self contained. This means there is only

a single scope where objects are delim ited. Directories organized in this way may

be processed in a F irst-In F irst-O ut(F IFO ) fashion. In a pure hierarchical organiza

tion, directories are processed in a Last-In F irst-O ut(LIFO ) style. The semantics in

these types of docum ents are structured in a nested way such th a t only the inner

most directory has no dependency with the others. The m ost common structure is

the combined one which is characterized by a random pattern of FIFO and LIFO

organizations. This case may be considered general as it contains the previous two.

For this arrangem ent, the possible num ber of docum ent structu re patterns which can

18This characteristic is supported by the fact that for every CFG there exists a Pushdown Automaton that recognizes the language [55, 107, 69].

19 Even though text-based forms of representation are expected to be used in most applications, the ideas presented here also apply to other input formats.


9 6

be obtained for a given num ber of directories is provided by the nonlinear recursive

expressionn

Pn = Y,PiPn-it=l

for

Pn P n —1? P o 1

where n is the to ta l num ber of d istinct directories. This means th a t for a given

docum ent which requires, for example, 6 distinct directories, 132 docum ent structure

patterns can be obtained. Therefore a small number of language processors can

be rearranged in many different ways as a form of supporting docum ent updates20.

This indicates th a t, once the associated language processors have been generated, all docum ent modifications, which do not involve the addition of new directories during the authoring process, will depend only on the synchronization procedure needed for the generation of the corresponding hierarchical interm ediate representation21.

The notion of directories as a s tructu re to support the specification of the syntax

and portions of the semantics of m athem atical concepts has been introduced in the

previous sections. It is intuitive th a t directories m ust only be involved during the

authoring process if there exists a t least one concept th a t needs to be represented. On

the other hand, no m athem atical construct may be m anipulated during the authoring

process w ithout the clear indication of where its related structure has been defined. These concerns are summarized by the following definition:

D e f in it io n 15 Let M be a finite set of m athem atical concepts. The sem antic struc

ture of a docum ent Dj involving m athem atical concepts M is considered irredundant

if for each directory G j in D j there exists a m athem atical concept m G M such tha t m is represented in the scope of GK

Two characteristics which relate to the way directories take part during the organization of m athem atical inform ation in docum ents have been presented. In an informal

way they sta te th a t the semantics of a docum ent is defined by means of a set of directories and each of these directories m ust contain a t least one object in it. These

20 It is understood that these updates do not require the addition of new directories.21 This characteristic is of course subject to storage requirements. The choice of either keeping the

language processors in main or secondary storage is an implementation decision.


9 7

characteristics constitu te the fundam ental requirements th a t need to be considered

for the elaboration of a mechanism to determ ine the way directories are m anipulated

during the authoring and processing phases. This mechanism must be flexible enough

to provide the au thor with the freedom to configure versions of the docum ent by s tan

dard docum ent operations such as insertion and deletion. Each docum ent version is

therefore the result of a set of operations which may have changed the docum ent's

internal structure, modified the contents of the docum ent, or both. Modifications af

fecting only the contents of docum ents by either including or removing objects wdiich

belong to the set of directories currently defined for the structu re of the docum ent

have no further im portance besides an increase or reduction in the am ount of infor

m ation th a t is to be supplied to a Tenderer. On the other hand, any modification

which affects the docum ent’s logical s tructu re would need to be executed under a stable form of control.

6.6.1 Directory Composition Example

Small fragments of docum ents containing simple expressions th a t overload the + symbol are provided to illustrate the notion of directory composition. The syntactical

s tructu re as defined by the production rule

directoryscope —> { directory-definition ) block-objects ( /)

is used to delim it the scope of a directory where the strings of letters are nonterm i

nals and the symbols () and / are term inals. Nonterminals directory-definition and

block-objects are gram m ar variables th a t have been used to represent a directory and

the m athem atical constructs included in the block respectively.

Dl.O( Expression )

1 + 1 + 0 = 2 1 + 1 + 1 = 3

1 + 1 + 0 = 110

1 + 1 + 0 = true

(/)


9 8

The docum ent fragment Dl.O, as above illustrated, is characterized by a monolithic

organization where the definition of all m athem atical objects included in the docu

ment are placed in the Expression directory. The fact th a t the + symbol has been

used in the above example to represent different m athem atical concepts, characterizes this one-directory docum ent fragment as overloaded. A voice Tenderer system, for instance, will not be capable of providing the appropriate m eaning th a t has been

attached to the -I- symbol in each of the four expressions. This is because the repre

sentation used assumes tha t

• only visual-based views are necessary, and

• the reader has the required knowledge to decode the different meanings assigned

to the + symbol.

The above problem is approached here by dividing the single directory into three

separate ones in order to ensure th a t the directory is not overloaded. A directory-

based organization is consequently obtained. The resulting docum ent organization, as

shown below, has therefore been structured according to the addition, concatenation

and disjunction operations th a t have been attached to the + symbol.

D l . l( Addit ion )

1 + 1 + 0 = 2 1 + 1 + 1 = 3

(/)( Concatenat ion )

1 + 1 + 0 = 110

(/)( Disjunction )

1 + 1 + 0 = true

(/)

The docum ent organization D l . l differs from organization Dl.O bv the fact tha t

the im plicit knowledge needed to distinguish syntactically identical operations with

different meanings, as provided by Dl.O, has been replaced by the three distinct directories. This means the task associated with decoding to resolve am biguities th a t


9 9

was previously left to the user, has now been assigned to the au thor of the docu

ment. Therefore, besides outlining the m athem atical concepts used in the docum ent,

the au thor is also responsible for specifying the directory in which the syntax and

semantics of these concepts is defined.

The docum ent fragm ent D l.2 as shown below, is an example of a docum ent structure

where combined directory com position is used. A lthough the m athem atical objects

involved are the same as the ones in the previous two versions, this organization differs

from the other two by the way the directories have been arranged.

D l.2( Addition )

1 + 1 + 0 = 2

( Concatenat ion )

1 + 1 + 0 = 110

(/)( Disjunction )

1 + 1 + 0 = true

(/)1 + 1 + 1 = 3

(/)( Disjunction )

1 + 1 + 0 = true

(/)

6.6.2 The Control Mechanism

1 directory .scope -> < directory-definition > block.objects < / >2 directory-definition -> ID EN TIFIE R | s tm tJ is t3 block_objects various_exprs scope.change4 various_exprs various.exprs ; new .expr | new_expr5 scope.change -> directory .scope various.exprs | directory .scope | e

Table 6.17: CFG for the binding control mechanism.


100

The docum ent organization provided by the three examples from the previous subsec

tion illustrates th a t a form of control is necessary in order to ensure the correctness

of the directory composition forms. This requirem ent has been introduced in Sec

tion 6.2 as c, the binding control. As part of the definition of a docum ent instance

structure Sj — (Dj ,c) , the binding control is a CFG. A possible definition of c to

support the directory composition forms is provided in Table 6.17. The nonterm inal

s tm tJ is t is defined in the gram m ar fragm ent described in Table 6.16 and the nonter

minal new_expr is only to be defined whenever directories are created. This means

any CFG which defines a directory will have new_expr as a s ta rt symbol.

6.7 The Role of Compilers

The organization imposed by the dynam ic authoring model allows the au thor of a

docum ent the possibility to modify both the syntax and semantics of the notation.

Therefore modifications proposed a t the abstract level, by the author, must always be

supported by the docum ent processing environm ent. This requires th a t if gram m ar definitions need to be modified, the corresponding language processor will need to be

created to process the new version of the language22. Therefore different language

processors might need to be produced during the authoring process.

To approach the semantics capturing of m athem atical concepts as proposed in Sec

tion 5.3, the organization of m athem atics is viewed as a set of fields. According to

this strategy all concepts th a t belong to a field can be captured by a directory and therefore require the support of an adequate language processor.

In a general scenario, docum ents often involve inform ation th a t belongs to more than

a single domain. For this reason the notion of directory as a collection of domains

was introduced. A processing structu re to support this arrangem ent would dem and as

many language processors as the num ber of directories necessary to cover the s ta te of

knowledge addressed by the docum ent. Therefore the number of language processors

to support the dynam ic authoring model will always be greater than two23 if the

22It is assumed that one directory may be composed of a set of grammar fragments.23The propsed document structure supports the directory swap strategy" by means of a CFG, the

binding control. For this reason at least two language processors are required. Consequently at least one additional processor will be needed to process the objects included in the document. This organization forces the number of processors to be one greater than the number of directories in the document.


101

docum ent to be processed involves inform ation defined in more than one domain.

Different language processors will take over the processing activity at selected parts

of the docum ent. Each processor is viewed as an agent tha t has knowledge to validate

the syntax of its m athem atical objects and perform other tasks as determ ined by the

semantics of the objects.

Letting the num ber of directories in a docum ent be a param eter under the control

of the au thor indicates th a t an equal num ber of language processors will need to be

provided in order to support each required directory. To meet this requirem ent, this

thesis proposes th a t language processors be dynam ically created by the software used during the authoring activity.

The au tom atic creation of language processors based on the knowledge provided by CFGs requires inform ation about the position of both term inals and nonterm inals.

A lthough the gram m ar structure imposed by the ENF determ ines th a t a t most one

term inal is perm itted in production rules, the num ber of nonterm inals is left unre

stricted. One exception to this is the operatorless production which is always composed of one nonterm inal.

Representing the m eaning of m athem atical concepts by means of CFGs requires th a t

all inform ation which is part of the concept has to be m apped to the set of production

rules. This includes the set of symbols used for the representation of the nam e of the

concept, its a ttribu tes and delimiters.

Having the name of the concept as a term inal and both its a ttribu tes and delim iters

represented as nonterm inals introduces the need for an additional mechanism in order

to distinguish a ttribu tes from delimiters. For this purpose, a set of a ttribu tes is added

to the gram m ar rules.

As an extension to the gram m ar structu re already proposed for capturing semantics, these a ttribu tes will also be applied to the definition of the term inal symbols. The

attachm ent of a ttribu tes to the rules of CFGs was proposed by K nuth [63, 78]. The

resulting gram m ar is called an a ttrib u ted gram m ar.

The use of a ttribu ted gram m ars to support the semantics capturing of m athem atical

concepts does not require any modification to the approach already presented. Both the com position and extension operators can also be applied to a ttribu ted gram m ars.

The following definition presents this characteristic:

D e f in it io n 16 Let Gi — (A i, Tj, P i, S i, A , « i) and G i = (iV2, T2, P2, S2, A , a 2) be two


102

a ttribu ted context-free gram m ars. The extension operation Gi % G 2 will produce an

a ttribu ted context-free gram m ar G 3 = (N 3 ,T S, P 3, S3, A, a 3) defined as follows:

• For the underlying context-free gram m ars one has G 3 = G\ % G 2 where G,

denotes Gi w ithout a ttribu tes.

• The m apping a 3 is given by

a ttribu ted context-free gramm ars. The composition operation G\ o G 2 will produce

an a ttribu ted context-free gram m ar G 3 = ( N 3, T3, P 3, S 3 , A , q 3) defined as follows:

• For the underlying context-free gram m ars one has G3 = G\ o G 2.

• The m apping a 3 is given by

( a i ( A - > w ) , if i 4 -> it'G Pi, a 3(A -> w = <

| ct2 (A —> w), if A —► w G P 2 or '4 —> it G P i fi P 2

for every rule A w € P 3.

The following section proposes the s tructu re of the gram m ars which will be used to

support the definition of the domains. This m eta-gram m ar is therefore the tem plate

which will be applied during the creation of every gram m ar fragment required to

capture the meaning of m athem atical concepts.

In this section a ttribu ted CFGs are used as an aid to specifying the semantics cap tu ring of m athem atical concepts. Synthesized a ttribu tes [6] are attached to production

rules which are either prim itive or basic. These a ttribu tes supply additional semantics

(6 .10 )

for every rule A —> w € P 3.

D e f in it io n 17 Let G i = P l5 Si, A , a^) and G i = (N2, T2, P2, S 2, A , a 2) be two

6.8 M eta-Structure


103

of the concepts involved th a t have been om itted due to lim itations of the CFG part of

the structure. The a ttribu tes are represented as gram m ar variables regular_expr and

cardinality as defined in Table 6.18. The cardinality variable holds the position of

the argum ents th a t are associated with the term inal symbol of the rule. Nonterm inal

cardinality is e for operatorless and prim itive productions because they both have no

argum ents. For basic productions cardinality is defined in term s of the args.position

nonterm inal. In this case the position of the argum ent is identified bv a positive integer greater than zero. The nonterm inal regular_expr is used to represent regular

expressions th a t describe the symbol arrangem ent applied to the com position of te r

minals. Regular expressions used by this gram m ar follows both syntax and semantics

defined by lex [66].

m eta cfg a ttribu tescfg -> NONTERM INAL : itemsitems items itemitems itemitem —> TERM INAL | NONTERM INALattribu tes —> # regular_expr # cardinality | ecardinality —> ( args.position ) | eargs.position -> args_position , position | positionposition —> IN TEG ER

Table 6.18: P roduction rules for the m eta-gram m ar.

The m eta-gram m ar part of PNS is defined by the set of production rules shown in

Table 6.18. The proposed gram m ar defines nonterm inal m eta as the s ta rt symbol which structures the problem according to the two nonterm inals on the right side of

the rule. The part s tarting with the nonterm inal cfg defines the structu re of the rules

in the CFG part of the structure. N onterm inal a ttribu tes, proposes the s tructu re for

the synthesized a ttribu tes.

Table 6.19 illustrates the organization proposed by the m eta-structure. This example

shows the EN F version of the gram m ar displayed in Table 5.16 with the corresponding set of a ttribu tes attached to each production rule. P roductions 2, 3 and 8 are

operatorless productions, therefore they have no attribu tes. P roductions 9 to 12 are prim itive productions. For this reason they have only the regular expression


104

1 identity _expr sample_expr EQ sample_expr # " = " # (U3)2 sample_expr expr3 sam ple.expr sum4 sum SUM left_del sum .elm ts right.del # vSumv # (3)5 sum_elmts rangeJist SUMDEL sam ple.expr # v;” # ( T 3 )6 rangeJist s ta r t LISTDEL end # " , " # ( 1 , 3 )7 s ta rt identifier ITERATIO N expr # ' # (1,3)8 end expr9 identifier ID EN TIFIE R # [a~z]+ #10 leftDel LEFTDEL #••{•• #11 right Del RIG HTDEL # T #12 expr IN TEG ER # [1-9] [0-9]* #

Table 6.19: A ttribu ted gram m ar to support the capturing of simple sum m ations,

a ttribu tes.

6.9 Conclusion

In this chapter I have presented a gram m ar-based docum ent organization to cap

ture the m eaning of m athem atical concepts. The approach models the dynam ics of

authoring m athem atics and supports the introduction of user-defined syntax to rep

resent m athem atical concepts. This means, the semantics of m athem atical concepts

included in the docum ent can be bound to syntax proposed during authoring. These

ideas are expressed in term s of the Docum ent Description Model described as follows.

A Docum ent Description Model (DDM) is a structure composed of

1. a docum ent dictionary H 3 such th a t all gram m ars in this set are in ENF, and

2. the following operations:

(a) G ram m ar Creation: introduced in section 6.4 by the composition operato r o.

(b) G ram m ar U pdate: introduced in section 6.4 by the extension operator 9c-

G ram m ars resulting from this operation as well as from the gram m ar cre-


105

ation operation are elements of set Cf for some version of the document

s tructu re j and docum ent directory i.

3. G ram m ar Identity: provided by the union operations used for the creation of

the dom ain directory G\.

4. Closure: all gram m ars introduced by the creation and the update operations are in Hj.


106

Chapter 7

Examples

Among the various forms of representation available, the conventional notation is the

one which has been used by the m ajority of the activities which involve the comm unication of m athem atics. A m ajor lim itation on rendering m athem atical concepts

according to this notation is the syntactic overloading of the symbols used for the en

coding of the operators. This problem has been discussed in Section 5.2, and Figure

5.1 displays three common meanings th a t are usually attached to symbol v.

It is assumed, in this thesis, th a t people, most of the time, get exposed to m athem atics

by means of the encoding provided by the conventional notation. For this reason this

notation has been used in this work as the basic source of inform ation for the semantics

capturing process. A lthough sometimes the encoding provided bv the conventional

notation is not the ideal, it is im portan t to m aintain the syntactical arrangem ent this

encoding provides. This decision is fundam ental to the capturing strategy because the

choice of a notation which is widely used should free the au thor from the requirement

of learning the alternative syntax supported by the capturing system.

In this thesis a docum ent structure composed of a ttribu ted gram m ar fragm ents is

proposed to capture the m eaning of the m athem atical concepts. Context-dependent

representations are supported by a directory change mechanism where a set of gram

m ars is replaced by another to allow other interpretations to be associated with the symbols considered. The following sections illustrate the structure proposed by describing the process involved during the authoring of simple docum ents which only

contain m athem atical concepts represented by strings of characters.


1 0 7

7.1 Example 1: Overloading the + and * symbols

9\ ee ee EQ te # r = " # ( l , 3 )92 ee te93 it IN T E G E R # [1-9] [0-9]* #94 ep ep PLUS tp # " + ” # (1,3)95 ep tp96 st STRING # [0-9]+ #97 ec ec CAT tc # ”+-’ # ( 1 , 3 )9s ec tck new_expr ee

Table 7.1: Default gramm ars.

h te epk tp ith te ecu tc st

Table 7.2: G ram m ar fragments created by editing.

Consider the need to overload the + symbol in order to represent both the addition

of integers and the concatenation of strings of characters. The following docum ent

illustrates this by means of two identity expressions which use the same syntax for

their left side of the equality. This docum ent is called Prototype because it is the

au tho r’s first a ttem p t towards its creation.

Prototype< d x >

1 + 1 + 0 = 2

< d2 >

1 + 1 + 0 = 110

< / > < / >


108

The above prototype version of the docum ent is composed of two domains represented

by di and d2. As illustrated by the syntax of the docum ent, domain should contain

the necessary rules to recognize the left side of the first equality as the addition of

three numbers. In a similar way, dom ain d2 should contain rules to recognize the operations on the left side of the second equality as the concatenation of characters.

In order to provide the com plete structu re to support this docum ent, assume the

au thor initially has access only to the set of default gram m ars as defined in Table 7.1.

G ram m ar fragm ents g i, g 4 and <77 support expressions involving equality, addition

and concatenation operations respectively. G ram m ars g% and ge define the domains

over which the specification of addition and concatenation operations can respectively

apply. G ram m ar fragments g2 g$ and g% support the definitions of the equality, the

addition and the concatenation operations respectively. G ram m ar lo links gram m ar

gi to the control mechanism. G ram m ar fragments l\, /2, /3 and /4 have been created

by editing in order to provide the necessary links with the other fragments. The

result of the au th o r’s first a ttem p t to produce a structure to capture the m eaning of

the two mentioned m athem atical concepts is provided by Exl-V ersion 1. This code is presented as follows and it illustrates the two domains as well as the expressions

involving the overloaded operator.

Exl-V ersion 1< { l o ° { g \ 0 ^ 2 } ^ } ^ ;

{{ti 0 / 1 o { g i o g 5} t 3} t 0;

{ t 0 o l 2 o g 3} d l >

1 + 1 + 0 = 2

< {{<1 o h o { g 7 ° g s } t 4 } t 5 o l A o g 6} d 2 >

1 + 1 + 0 = 110

< / >< / >

As stated before, the main objective of this initial version of the docum ent is to represent both the addition and concatenation operations by the + symbol. For this purpose the au thor organizes the inform ation to be presented into two separate do

mains as a way of resolving the sem antical nondeterm inism generated bv overloading

the -I- symbol. The gram m ar fragments used for the definition of dom ain d\ have


109

been obtained from domain directory G? and the fragments used for the definition of

domain d2 were taken from domain directory G°. The complete definitions to support

this version of the docum ent are described next according to the docum ent structure

proposed in C hapter 6.

{gi o g 2}t2 ee ee EQ te # ”= ” # ( 1 , 3 )ee te

{/0 0 f 2 } t l new_expr eeee ee EQ te # ?r=TT # (1,3)ee te

{g4 0 g 5}t.3 ep ep PLUS tp # " + r # (1,3)ep tp

{ h o h o f 3 } * 0 new_expr eeee ee EQ te # ” = ” # ( 1 , 3 )ee tete epep ep PLUS tp # " + r # (U3)ep tp

{ t 0 o l 2 o g 3}di new_expr eeee ee EQ te # ” = ” # ( 1 , 3 )ee tete epep ep PLUS tp # " + " # ( 1 , 3 )ep tptp itit IN T E G E R # [1-9] [0-9]’ #

Table 7.3: G ram m ars in domain directory G? th a t have been created by gram m ar operations.

The current version of the docum ent is supported by the docum ent instance structure

S0 = (D 0, c ). The organization of the sem antic structure D 0 is defined in term s of its two dom ain directories G j and G° for this initial version of the docum ent as follows:

A ) = (G ?,G °) (7.1)

where G® is defined as

G° = U Fi° U G{* = {pi, p2, <73) <74) <75) /o, to, < i} (7-2)


110

{97°98)tA ec ec CAT tc # r + r # ( 1 , 3 )ec tc

Oi ° h ° £4 ) ^ 5 new_expr eeee ee EQ te

co'

4bJl4b

ee tete ecec ec CAT tc # " + " # (1,3)ec tc

{ t b o U o g 6} d 2 new_expr eeee ee EQ te # " = ' ■ # (1,3)ee tete ecec ec CAT tc # " + " # (1,3)ec tctc stst STRING # [0-9]+ #

Table 7.4: G ram m ars in domain directory G 2 th a t have been created by gram m ar operations.

with

= 0i> ^}; F® = { g i , g 2,g3,g4,g5, lo}- C? = {to, t l , t-2 , t.3, rfi} (7.3)

and G 2 as

G 2 = N % U F 2° U C 2 = {ge , 9 7 ,98, h, h, 1 ,^ 4 } (7.4)

with

= {ge, 97,98, h,U}'- -P21 = { ^ 1 } - C 2 = { £ 4 , t 5, ^ 2 } . (7.5)

The to ta l set of gram m ars m anipulated by this initial prototype is given by

2

^ 0 = U G® = {g\ , 9i, 9z, g , gb, 98,9i, 9 8 A a , h , h , h , h , t \ , t 2 , h , U , h , d.\ d 2 } ■ (7.6)i = l

Now consider the need to modify the current version of the docum ent in order to include two other concepts: m ultiplication of integers and consecutive concatenation of strings of characters which is here called the power of strings. The power operation


I l l

is a binary infix operator which concatenates its left operand the num ber of times

stated by its right integer operand.

Syntactically, both operations are represented by the * symbol. This characteristic

indicates th a t two distinct directories will need to be provided in order to capture

the meanings assigned to the * symbol. The code under the label Exl-V ersion 2presents expressions involving these operations as well as both the addition and the

concatenation as introduced by Exl-V ersion 1.

Exl-V ersion 2< d2 >

1 + 1 + 0 = 110

< { ^ 2 % { h °{<?9 ° 5 l o } ^ 6 0 >

1 + 1 * 0 = 1

< { t 5 ° h ° k ° g n ° k ° g 3 ° g n } d i > a * 3 + b = aaab ;

1 * 3 = 1 + 1 + 1 = 3:

< / >1 * 3 = 1 + 1 + 1 = 3

< / > < / >

The code presented by Exl-V ersion2 above, makes use of three d istinct domains d2, d3 and d4. A lthough dom ain d 2 has been reused from the previous version of

the docum ent, domains d3 and d4 needed to be created. The com plete definition

of the structu re which supports this is provided by the docum ent instance structure

S i = (£>i,c). The sem antic structu re D\ is defined in term s of its three domain

directories G}, and G 3 for this version of the docum ent as follows:

D 1 = ( G 11, G 12, G 13, G 12) (7.7)

where is defined as

G} = N l U F} U C \ = { d 2} (7.8)

with

.% ' = {}; F ,1 = {<(,}; C} = {}, (7 .9 )


112

G\ as

G 2 = U F 2 U C 2 = {<?g, ^ 1 0 , hi 6) -6) d2, d3} (7-10)

with

= {gg, giQ,h,k}'- F 2 = {d 2}: C \ = { U , d z } , (7.11)

and G 3 as

^ 3 = -W3 u ^ 3 u ^ 3 = {#3, tfiii P12 , h, k , h, h i <M (7-12)

with

^ = {^ 1 1^ 12^ 7 , ^8,^9, }; Fg1 = {p 3, f5}; C 3 = { d 4} (7-13)

The gram m ars involved in this new version of the document structure are given by

3

H\ = | J G] = {g3i g^i g\0i g\\i g \ 2 i h i h i h i h i l 9 i h i h i d 2 , d ^ , d 4 } (7-14)1 = 1

99 tm tm MULT fm # v* " # ( l , 3 )gio tm fmh tp tmh fm it

Table 7.5: G ram m ars in dom ain directory G \ created bv editing.

Tables 7.5 and 7.6 provide gram m ars which belong to domain directory G \ . These

gram m ars have been introduced by editing and by gram m ar operations respectively.

Table 7.7 shows the gram m ars which belong to G 3 . They were introduced by editing.

Table 7.8 shows the gram m ars in G 3 which were introduced by editing.

7.2 Example 2: Symbols as operators and operands

This exam ple proposes a sem antic structu re to support the syntactical overloading

of the symbols + and *. Two different meanings are attached to each symbol and


1 1 3

{<79 0 <7lo}^6 tmtrn

tm MULT fm fm

# v*v # ( 1 , 3 )

{ ^ 2 % { h ° t e 0 ^ } } ^ 3 new_expr eeee ee EQ te # : =" # (1,3)ee tete epep ep PLUS tp # :’+" # ( 1 , 3 )ep tptp ittp tmtm tm MULT fm # ;' * " # ( 1 , 3 )tm fmfmit

itIN TEG ER # [1-9][0-9]’ #

Table 7.6: G ram m ars in domain directory G \ th a t have been created bv gram m ar operations.

9n tp St PO W E R fp # # (1,3)9 n st ALPHANUM # [0 - la -s ]+ #h tc tph tc Sth fp it

Table 7.7: G ram m ar in dom ain directory G 3 created by editing.

each m eaning requires a customized dom ain where gram m ar fragments are needed to

support the sem antic capturing process.

Although the semantics usually attached to these symbols characterizes them as bi

nary operators, as provided by dom ain d3, many other meanings may also be associ

ated with them . One possibility, for example, is to have them as the elements of a

set. For this scenario, the two symbols will be the operands of the comma " ,v binary

operator which is used to organize the elements of a set in a list form at. This characteristic is illustrated by the single statem ent defined within the scope of dom ain d$

in the sem antic structu re th a t follows:


1 1 4

{ t 5 0 l7 0 / 8 0 <7U O / g 0 <73 0 < 7 1 2 ) ^ 4 new_expr eeee ee EQ te # " = r # (1,3)ee tete ecec ec CAT tc # " + ' ■ # ( 1 , 3 )ec tctc tptc sttp st PO W ER fp # " * " # ( 1 , 3 )fpit

itIN TEG ER # [1-9] [0-9]* #

st ALPHANUM # [0 - l a - 2 ]+ #

Table 7.8: G ram m ar in domain directory G 3 created by gram m ar operations.

Ex2:< d 3 >

0 + 1 * 1 = 1 < d 5 >

R = S = {+ ,*}

< / >< / >

^10 te idIn te bs9 l 3 id ID EN TIFIE R # [A-Z\ #514 bs SET el endset # ' {" # (2,3)515 endset EN D SET # T #516 el el LISTDEL tl # ,!= v # ( l , 3 )517 el tl518 tl BINARYOP # [+*] #

Table 7.9: G ram m ars in dom ain directory G!> created by editing.

Tables 7.9 and 7.10 illustrate all gram m ars required for this example. Since gram m ar

t i has already been defined in Section 7.1 it has not been included in these tables.


115

{<714 0 <7l5} 6 bsendset

SET el endset ENDSET

#T # ( 2 ) #T #

{<7i6 0 9 n } t7 el el LISTDEL tl # :' = " # (1,3)el tl

{fi o / i o ° / n ° <?i3 ot.6 o t 7 o g \ s } d 3 new_expr eeee ee EQ te # " = r # (1,3)ee tete idte bsid ID EN TIFIER # [A-Z] #bs SET el endset # " { " # ( 2 )endset ENDSET # ''} '■ #el el LISTDEL tl # V # ( 1 , 3 )el tltl BINARYOP # [+*] #

Table 7.10: G ram m ars in domain directory G® created by gram m ar operations.

According to the m eta-gram m ar defined in Table 6.18 the a r g s - p o s i t i o n nonterm i

nal, included there, has the purpose of identifying the position of the argum ents of

the m athem atical concept represented by the associated rule. This nonterm inal is, in

the m eta-gram m ar, expanded as a list of integers.

G ram m ar fragm ent gu has integers 2 and 3 as its a ttribu tes. According to the m eta-

gram m ar, these two a ttribu tes determ ine the position of the nonterm inals which are

relevant to the definition of the concept presented by the only production rule th a t

gram m ar g u has. Argument 2 relates to the list which is defined by gram m ar g\§.

A rgument 3 indicates where the delim iter for the end of a set representation is placed.

The notion of sp litting pairs of symbols which together are part of the syntax of a

concept is used here as a way to ensure th a t production rules involving these concepts

are in the ENF. For this reason the symbol { from the pair {} has been used for the definition of a set in gram m ar g14.

The rest of this section describes the sem antic structure D 0, for this example, according to the model proposed in C hapter 6. The two domains d3 and d5 are defined as

elements of the domain directories G° and G° respectively as follows:

D o = (G ?,G °,G °) (7.15)


116

Domain G ® is defined as

G “ = JV® U F? U C f = { d 3} (7.16)

with

N i = { h F i = { d s h C? = 0 (7.17)

and G° as

^ U F 2° U C j = {Zio, Zn,3i3, <7m, Pis, <7i6i 9 1 7 , Pis, ^6, *7, d s} (7-18)

with

^ 2 = { 10 J l l , Pl3, Pl4, Pl5, Pl6 , 017, Pl8 }: • 2 = {^l}' ^ 2 = { 6 , ^7, ^ 5 } (7.19)

The gram m ars required for this exam ple are provided by

2

Ho = ( J G,° = {p13, P1 4 , pis, pi6 , P1 7 , Pis, 7io, 7n, te, t 7, d5}. (7.20)j = i

7.3 Example 3: More meanings for the + symbol

The docum ent structures introduced by the previous examples illustrated a scenario

where the overloading of symbols took place in distinct expressions. This means

a given symbol appeared in more then a single expression w ith different meanings

associated with it. This problem was approached by a context switch where the

current dom ain was replaced by an adequate one th a t provided the necessary gram m ar support for the capturing of the m eaning of the concepts involved.

Symbol overloading may also take place w ithin the expression itself. For this scenario

the context switch would introduce as many distinct domains as the num ber of differ

ent meanings which are associated with any given symbol included in the expression.

This section presents a docum ent structu re to support expressions which require more then a single dom ain to capture the m eaning of the concepts they represent. To illustra te this problem consider the following expression which attaches two different


1 1 7

meanings to the symbol + .

|A + B | + 1 = a (7.21)

The semantics of expression 7.21 determ ines th a t a is equal to the addition of integer

1 and the determ inant of the result of the addition of m atrix A with m atrix B . As it can be seen the symbol + is overloaded since it is used to represent both the addition

of m atrices and the addition of integers. The rest of this section provides a structure

to capture the semantics of this expression.

Ex3:

< {<7l9{°<720 0 9 2 } h } t 9 -

{^9 ° 921 0 <722 } # 6 >

IA + B | < {<723 ° ^ 2 0 <724 } # 7 >

+1 = a

< / >< / >

<7l9 new_expr D ET ee endet domain_scope # T # (2,3,4)*720 ee ee MATRIX_ADD et # " + " # ( 1 , 3 )<721 et MATRIX J D # [A-Z\ #<722 endet EN D ET # T #

Table 7.11: G ram m ars in dom ain directory G \ created by editing.

<723 new_expr PLUS ee # " + :' # ( 2 )

<724 te CONSTANT # (0| [1—9] [0-9]*)|[a-z] #

Table 7.12: G ram m ars in domain directory G \ created by editing.

The sem antic structure for this docum ent is defined as follows:

D 0 = (G?,G«) (7.22)


118

{<?20 0 <?2}^8 ee ee MATRIX_ADD et # ” + ’■ # 0 , 3 )et MATRIX J D # [A -Z \ #

{<7l9 0 t s } h new_expr D ET ee endet domain_scope # T # (2 ,3 ,4 )ee ee M ATRIX.ADD et # " + " # ( 1 ,3 )et M A T R IX JD # [A -Z \ #

{£g 0 p21 0 5 2 2 ) ^ 6 new_expr D ET ee endet domain_scope # " | " # (2 ,3 ,4 )ee ee MATRIX_ADD et # r + " # ( 1 ,3 )et M A T R IX JD # [A -Z ] #

endet EN D ET # " | " #

Table 7.13: G ram m ars in domain directory G ° created by gram m ar operations.

{<723 0 f 2 0 <724 } < # new_expr PLUS ee # :' + " # ( 2 )ee ee EQ te # ” = v # 0 ,3 )ee tete CONSTANT # (0 |[l-9][0-9]*)|[<H #

Table 7.14: G ram m ars in dom ain directory G 2 created by gram m ar operations.

where G® is defined as

G? — N ° U F f U C f = {p2i <7i9, <?20,52i, 922 , h , h , #>} (7 .2 3 )

with

^ i° = {<?i9, <720, <721, # 2 2 }; F f = {^ 2}; G f = {<8, <9 , d6} (7 .2 4 )

and G 2 as

= A^ 1 U F 2° U C 2° = {<723, 924 , h , d j ) (7 .2 5 )

with

= { 923, 924, <#}; F 2° = { t 2}; C 2 = {d 7}. (7 .2 6 )

The to ta l gram m ars m anipulated by this initial prototype is given by

2

H q — | J G ° = {<?2, <?19, #20, <721, <722, <723, <724#2# 8 # 9 , ^6, d 7} (7 .2 7 )


119

Tables 7.11 and 7.13 are both associated with domain directory G?. Table 7.11

shows all gram m ars in this directory th a t were created by editing and Table 7.13 the

gram m ars th a t were generated as the result of com position operations. In a sim ilar

way the gram m ars in Tables 7.12 and 7.14 are associated with the domain directory

G°. The gram m ars in Table 7.12 are the result of editing and the gram m ars in 7.14

were created by composition.

As discussed in Section 7.2 the integer a ttribu tes which are introduced as part of

the rules of some gram m ars, have the purpose of determ ining the position of the

relevant nonterm inals of a rule. In Table 7.11 gram m ar fragment < 7 1 9 uses a ttribu tes

2 ,3 and 4 to refer to its three nonterm inals th a t are necessary in order to support

the correct expansion of this rule. N onterm inals ee and endet are associated with

a ttribu tes 2 and 3 respectively. Although both nonterm inals ee and endet belong to the same dom ain directory, nonterm inal dom ain scope , which is associated with

a ttr ib u te 4, does not. As part of the dynam ic control gram m ar, this nonterm inal is

associated w ith the context switch which is need to provide the adequate gram m ar

for the m athem atical concepts being processed.


120

Chapter 8

The Processing Structure

In C hapter 6 the dynamics of authoring m athem atics was modeled by means of a

docum ent organization th a t uses CFGs as its fundam ental formalism. This chapter

presents a processing structure for the proposed organization.

8.1 Dynamic Authoring and Language Fragments

Throughout the previous chapters I have investigated problems related to modeling

the m athem atics' authoring behavior. One of my m ajor concerns when designing

a solution to this problem was th a t a t any instance during the authoring activity

the m athem atical concepts included in the docum ent had the ir semantics captured

regardless of the syntax used by the au thor for this purpose. This approach faces

the challenge of processing user-defined syntax1. This means a language processor to verify the syntactical validity of such a docum ent must be provided w ith the necessary

tools to support the processing of unpredicted language statem ents.

To recognize a given syntax such as a string of symbols say, for instance, w it requires a

CFG G such th a t w C L (G ). As already emphasized, allowing user-defined syntax for

expressing the semantics capture of m athem atical concepts introduces the possibility

of symbol overloading. In order to ensure th a t gram m ar G is not used to recognize

1A similar problem has been approached by [60] where a meta-language addition to the PASCAL programming language was proposed. The mechanism allowed the programmer to introduce his/her own syntax to the language.


121

syntax definitions which contain overloaded symbols the sem antic characteristics of

concepts needs to be considered. The expression 1 + 0 = 1 , for instance, could be

used to illustrate both the boolean O R operation and the integer addition depending

on the context determ ined by the author. Consequently no single CFG should be

provided to capture both meanings.

One fundam ental idea I have applied to support the use of CFGs to approach the

semantics capturing problem is the fact th a t authoring m athem atics is an incremental

activity. Under this assum ption the final docum ent may be viewed as the result of a

set of docum ent modifications performed by the au thor or for short a set of authoring

increm ents. A nother way to express this is th a t the dynamics of authoring m athe

matics can be modeled as a set of sta tes where each sta te is uniquely characterized

by a CFG or scope. In other words a finite autom aton whose states are CFGs and

transitions are supplied by the author. One problem with this association is to de

term ine the boundaries of an authoring increment. This means when one ends and the next is to be considered.

To get around this nondeterm inism I have used the state change concept as a mech

anism to resolve ambiguities. Of course a s ta te change, in this context, must also be

triggered whenever the syntax used for a given concept cannot be recognized by the

gram m ars defined for th a t state. A uthoring therefore requires no scope change as

long as no syntactical am biguities are introduced and all syntax proposed are valid

statem ents for the current scope. The syntax attached to a concept will only be valid

w ithin a given scope and will be recognized as long as the scope it belongs to is active.

According to this strategy the docum ent a t the end of the authoring activity will be

organized as a sequence of sets of gram m ars. Since the docum ent has been created

by an increm ental approach it is intuitive to structu re its processing by means of a

mechanism th a t supports this characteristic. In essence new language processors will

need to be provided as new scopes are introduced. This means the dynam ic authoring

characteristic determ ines incremental changes to be made in the nota tion /language

used. Therefore increm ental changes also need to be provided to the gram m ars used

for the definition of the nota tion/language. This process may be viewed as a language prototyping activity where language fragm ents are included as a way to support new features.


122

8.2 Processing Grammar Fragments

According to [104] a program m ing language processor is an application which m anip

ulates program s expressed in a given language. In this thesis, language processors or

processors will also be used to refer to these programs. Some well known program m ing

language processors are compilers and interpreters [104],

According to [6 ] the design of a compiler can be logically structured as the front end

and the back end. The parts associated with the source language are the lexical and

syntactic analysis, the symbol table creation, the sem antic analysis, the generation of interm ediate code and code optim ization. The front end is the collection of all these

parts. The back end portion is related to tasks th a t are associated with the target language. Therefore target code generation and target code optim ization are back

end tasks. The symbol table m anagem ent and error handling are tasks which are not

restricted to a single phase. These tasks may belong to both the front and back end

phases.

As described above the phase oriented decom position approach views a program m ing

language as a single indivisible object. An alternative way would be to describe a

language as a collection of fragments such th a t their combination would provide the

same processing power as the indivisible definition. The im portan t characteristic of

this approach is the fact th a t language fragments can be defined to represent not only

syntax but also the sem antic structu re of language constructs.

The following section presents the organization this thesis proposes to the construction

of docum ent processors to support the dynamics of authoring m athem atics. The

solution combines both notions of phase oriented processing and fragm ented language

definitions.

8.3 Dynamic Authoring and Document Processors

In Section 6.2.1 I have introduced a docum ent structure to model the dynam ics of

authoring m athem atics. The model described there organizes authoring as a sequence of sets of gram m ars. In this organization each set captures the syntax and portions

of the semantics of some m athem atical concepts th a t have been included in the docu

ment. A com plete sequence, in this case, characterizes one stage during the authoring activity. In o ther words it corresponds to a version of the docum ent.


123

In order to process a given version of a docum ent, say for instance, version v, the

docum ent processor m ust step through the com plete sequence of sets of gram m ars

which is associated with v s ta rting from the sequence's first element. As a result

a context switching or scope change will take place whenever a set of gram m ars is

replaced by another. This procedure is the approach this thesis proposes to capture

semantics th a t is associated with the field of knowledge th a t m athem atical concepts

belong to. It is through this mechanism th a t the meaning of concepts which are

represented by overloaded syntactical constructs are captured.

As proposed in Section 6.2.1 expression Sj = (D 3,c ) with j > 0 describes the s ta te of

the docum ent a t a given instance during authoring. In this case D-j represents the sets

of gram m ars needed to support the creation of version j of the docum ent. Support

for the sequencing behavior is provided by the binding control gram m ar c. In this

section the following definition refers to the organizations defined by both D j and r

to present a possible arrangem ent of language processors to handle the dynam ics of

authoring m athem atics.

Definition 18 Assume M is the binding control gram m ar expressed in ENF. Con

sider a given version of a docum ent structure say, for instance, version j . Let

D j = (G j, G 32 , ■ ■ ■, G^ ) be the sem antic structu re associated with version j and

P M%Gi be the language processor for directory i such tha t

P m %gj- : object's syntax —» hierarchical representation

The language processor for docum ent structure S3 is defined by the determ inistic

finite autom aton

PDj = (Q j,E j ,£ j ,S j ,F j)

where

• Q j is the set whose elements are all processors associated with the directories

th a t compose the sem antic structu re D 0,

• sj = P'm %g{

• = FM%GJnj

• E j is the set containing elements which are the syntax of m athem atical objects

associated w ith version j of the docum ent, and


1 2 4

• For all w G E j

' = P M%& if w e L ( G t ) ,M % C,\

£ Q ] - { P m % g >} otherwise

8.3.1 Example

C hapter 7 provides a set of examples to illustrate the organization this thesis proposes

to support the dynam ics of authoring m athem atics. A scenario where two versions of

a simple docum ent containing m athem atical expressions th a t overloads the + symbol

is provided in Section 7.1. Two docum ent instance structures So = (D 0,c) and

Si = (D \, c) have been created to support the m athem atical objects introduced during authoring.

The language processors associated with each version of this docum ent are therefore

P d0, for the first version, and P di for the second. The sem antic structu re for the

second version is

D, = (G|,G5,GS,GJ)

and the set of sta tes for its language processor is

Q i ~ { P m %g \ i P m %g \ i P m %g \->Pm %g \ }


125

Chapter 9

Concluding Remarks

This work introduced a user-oriented organization to support the creation of m ulti

purpose m athem atical docum ents. To approach this characteristic a mechanism to

capture the semantics of the m athem atical concepts was proposed. This mechanism models the dynam ics of authoring and allows m eaning-to-svntax bindings to take

place during the authoring activity. It also provides the au thor with the power to

select the syntax he/she believes is the most appropriate to express the ideas to be

com municated. A processing structu re to support the proposed organization was also presented.

9.1 Discussion

The organization introduced by the authoring model proposed in this work determ ines

th a t the semantics of a m athem atical concept is captured by the set of gram m ars th a t compose the directory which is associated with the concept. G ram m ars in this set

are structured according to the following characteristics: They either

1. have been created and are already available,

2. are the result of gram m ar operations, or

3. have been created by editing.

It is expected th a t the m ajority of the m athem atical concepts included in the docu

ment are supported by gram m ars which are already available. This means they are


126

part of a library and are ready to be used. In the event th a t new concepts need to

be introduced or their m eaning-to-syntax m appings need to be modified the model determ ines th a t the needed gram m ars are to be created by either editing or by the

application of operations on the existing gram m ars or a com bination of these two

approaches. E diting could be required only when few gram m ars are available or

whenever the concepts to be expressed require syntax th a t may not be supported by

operations on gram m ars th a t are already available.

Cognitive load is the degree to which cognitive resources are required for activities th a t

facilitate learning [99, 26]. According to [82] cognitive load increases with the am ount of inform ation to process. In [94], Salomon defines m ental effort as the num ber of

non-autom atic elaborations necessary so solve a problem. As noted by Clark in [31],

mental effort increases linearly and positively as the cognitive load increases. But how

can com puter-based systems be designed to reduce the cognitive load? As emphasized

by [82] inform ation overload can be reduced by modeling the user:

A user model can be described as a system knowledge source containing

assum ptions on aspects of the user th a t guide the behavior of the system.

The goal of building a user model is to reduce the user’s inform ation load.

This can be accomplished by adapting either the representation of the task or the task itself.

In the context of this work the task is authoring m athem atics and the representation

of the task is the approach taken for authoring. It is by reading and handw riting

th a t hum ans, m ost of the tim e, become exposed to m athem atics. Consequently the

mental model developed, during this activity, is the result of associations involving a

pen/paper-based form of representing the abstract m athem atical concepts. In other

words semantics of concepts are bound to syntax.

This thesis proposes a docum ent organization which:

1. models the dynam ics of authoring m athem atics and

2. allows the au thor the possibility of expressing m athem atical concepts bv means of syntax he/she feels com fortable with.

The cognitive load associated with authoring m athem atics by means of user-defined

syntax should therefore be reduced. By providing h is/her own m eaning-to-syntax


1 2 7

bindings the au thor is free from details of notations which introduce other bindings

he/she is not com fortable with. Em pirical study results collected by [7] determ ined

th a t the am ount of errors produced by users when entering m athem atics on com put

ers increases when longer equations are considered. A lthough not reported in their

experim ent it can be hypothesized th a t the am ount of errors produced by users dur

ing entering expressions is also increased with the complexity of the notation used

due to the cognitive load increase. Consider, for instance, the representation of the

sum m ation in the O penM ath system found in Subsection 1.2.5. The syntax used for

this example is complex and therefore not appropriate for speech input. Furtherm ore,

due to its length, according to [7], this form of representation is prone to input errors.

The representation of this type of sum m ation is simplified when captured by means of

the approach proposed in this thesis. A sim ilar example may be found in Section 5.8 which requires only a single line of tex t to capture the sum m ation.

According to [7] the m ultim odal handwriting-plus-speech form of entering expressions

was faster and be tte r liked than the keyboard-and-m ouse m ethod. In this case allow

ing the au thor the possibility of m ultim odal input should be beneficial if the au thor

has the freedom to propose the m eaning-to-syntax binding. As noted in earlier in

this thesis, approaches such as M athM L and O penM ath have not been designed to

support m ultim odal forms of input. This lim itation and the other two lim itations

in Subsection 1.2.7, counter-intuitive entry order and complex syntax form at, are

overcome by the approach proposed in this thesis.

9.2 Authoring with Grammar Fragments

In this dissertation I have described the goal of capturing the m eaning of m athem atical concepts by means of a docum ent structu re which

1. allows the semantics of m athem atical concepts be encoded by user-defined syn

tax, provided the notation is context-free and

2. supports both extensibility and am biguity characteristics of the conventional

m athem atical notation.

In C hapter 1 I have made three claims concerning my approach to authoring documents containing m athem atics. These claims are repeated here followed by comments about the approach I took to accomplish each one of them.


128

1. Both the meaning and syntax o f m athem atical concepts can be captured by a t

tributed context-free gramm ars. The solution I have proposed to capture the

semantics of m athem atical concepts is based on an organization th a t considers

the au tho r's needs as a fundam ental requirement. To support this character

istic I have modeled the dynam ics of authoring by means of a gram m ar-based docum ent structure, the DDM. A ttribu ted context-free gram m ars are used in

this structure. The a ttribu tes determ ine the following:

• the position of the operator's operand, and

• the necessary structure to identify the symbol arrangem ent to represent a

given m athem atical concept.

2. Extensibility can be supported by operations on the attributed gramm ars. Three

concepts related to extensibility were introduced in C hapter 6, the extension

norm al form, operation on gram m ars and fundam ental gramm ars.

• The extension norm al form was proposed in order to determ ine the gram

m ar form at to be used. The form at lim its the num ber of term inal symbols

in the g ram m ar’s rule. It also determ ines the possible term inal/nonterm inal

symbol arrangem ents each production rule must follow.

• Both the composition and the extension binary operations are defined for gram m ars in extension norm al form and both return gram m ars also in the extension norm al form. They allow the creation of gram m ars by combining

previously defined gram m ars. This approach introduced the possibility of

gram m ar reuse and incremental gram m ar definition.

• The notion of fundam ental gram m ars established the basic building blocks

to be used for the capturing activity. The three types of gram m ars defined

for this purpose provide the necessary means to support the creation of

any possible gram m ar. This statem ent is supported by the fact th a t each

one of these three gram m ars has only a single production rule which is of one of the types proposed by the extension normal form.

Since the composition and extension operators are defined for gram m ars in

the extension norm al form, the application of these operations on fundam enta l gram m ars will produce gram m ars which are also in the extension normal

form. This means gram m ars can be created during the authoring activity and


129

a program m able form of extensibility is therefore possible allowing user-defined

syntax to be introduced during the authoring of the docum ent. This mecha

nism assumes a set of default gram m ars is available. This has been described in

section 6.1 when a logical diagram introducing the activities involved during the

authoring of m athem atical concepts was presented. A detailed consideration of

this characteristic has been provided in section 6.2.

3. Am biguities generated by symbol overloading can be resolved by a scope mech

anism. As defined in C hapter 6 a docum ent instance structu re S is the tuple (D , c) where D is the sem antic structu re and c the binding control. The sem an

tic structu re D is a finite sequence of finite sets of gram m ars. These sets are

represented by a domain directory Gf where i determines the position the set

holds in the sequence and k refers to the version of the docum ent considered.

The binding control c is a CFG which defines the scope in which the rules

provided by each domain directory in the semantic s tructu re are valid. This means term inals defined in any given domain directory are local to the scope

determ ined by this structure.

9.3 Future Work

As stated in Subsection 1.2.8 com positionality of meaning is a design decision for

systems characterized by a sta tic syntax. In order to allow user-defined syntax to be

provided a t run-tim e additional complexity concerning the application of the com

positionality principle is introduced. This is because gram m ar rules will also need

to be supplied a t run-tim e. For this scenario com positionality is a system property.

Therefore any gram m ar th a t im plem ents the system must include support for compositionality.

The notion of fundam ental gram m ar introduced in C hapter 6 may be applied to sup

port the application of com positionality. Since these gram m ars are the basic building

com ponents, the representation of any concept will be subjected to the restrictions they introduce. Therefore compound concepts must be decomposed. The questions

to be asked a t this stage are:

1. Is the resulting decom position com positional?


130

2. Is there a way one can ensure com positionality of m eaning for such systems?

These questions I leave as open. A detailed investigation of the application of com

positionality is therefore a future goal. A part from the com positionality problem

the com plete im plem entation of the organization proposed in this dissertation is an im m ediate priority.


131

References

[1] J. A bbott: O penM ath Design Com m ittee R eport. Technical report, O penM ath

Consortium , 1996. Available from h ttp ://w w w .o p e n m a th .o rg /.

[2] J. A bbott, A. Diaz, R. S. Sutor: A R eport on O penM ath, A Protocol for the

Exchange of M athem atical Inform ation. SIG SA M B ulletin 30(1) (March 1996), 21-24.

[3] J. A bbott, A. van Leeuwen, A. Strotm ann: Objectives of O pen

M ath. Technical report, O penM ath Consortium , 1996. Available from

h t t p : / /www. openm ath. o r g / .

[4] G. D. Abowd: Formal A sp ec ts o f H um an-C om puter Interaction. PhD thesis,

Oxford University, Oxford, England, 1991.

[5] S. R. Adams: M odular G ram m ars for Program m ing Language P roto typ in g .

PhD thesis, University of Southham pton, Southham pton, England, 1991.

[6] A. V. Aho, R. Sethi, J. D. Ullman: Com pilers: Principles, Techniques and

Tools. Addison-Wesley, 1986.

[7] L. Anthony, J. Yang, K. R. Koedinger: Evaluation of M ultim odal Input for

Entering M athm atical Equations on the Com puter. In CH I ’05: CHI ’05 E x

tended A bstracts on Human Factors in C om puting System s. 1184-1187. ACM Press, 2005.

[8] D. S. Arnon, S. A. M amrak: On the Logical S tructure of M athem atical N otation. T U G boat 12(4) (1991), 479-484.

[9] R. A rrabito: Using to Produce Braille M athem atical N otation. 1987.University of W estern O ntario, U ndergraduate Thesis.


http://www.openmath.org/


132

[10] R. G. A rrabito: Com puterized Braille Typesetting: Some Recom m endations

on M ark-Up and Braille S tandards. M aster’s thesis, The University of W estern

O ntario, London, Canada, 1990.

[11] R. G. A rrabito , H. Jiirgensen: Com puterized Braille Typesetting: another view

of M ark-Up standards. Electronic Publishing 1(2) (Septem ber 1988), 117 131.

[12] A. A sperti, G. Bancerek, A. Trybulec (editors): Third International Conference,

M K M 2004. Lecture N otes in C om puter Science 3119, Berlin, 2004. Springer-

Verlag.

[13] A. A sperti, B. Buchberger, J. H. Davenport (editors): Second International

Conference, M K M 2003. Lecture N otes in C om puter Science 2594, Berlin,

2003. Springer-Verlag.

[14] R. Ausbrooks, S. Buswell, D. Carlisle, S. Dalm as, S. D evitt, A. Diaz, M. Frou- m entin, R. H unter, P. Ion, M. Kohlhase, R. Miner, N. Poppelier, B. Smith,

N. Soiffer, R. Sutor, S. W att: M athem atical M arkup Language (M athM L)

version 2.0 (Second Edition). Technical report, W3C, 2003. Available from

http://www.w3.org/TR/2003/REC-MathML2-20031021/

[15] Y. Bellik, D. Burger: M ultim odal interfaces: new solutions to the problem of

com puter accessibilty for the blind. In Conference com panion on Human factors

in com puting system s. 267-268. ACM Press, 1994.

[16] C. Bigelow, D. Day: D igital Typography. Scientific A m erica 249(2) (1983),

106-119.

[17] P. V. Biron, A. M alhotra: XML Schema P a rt 2: D atatypes. Technical report,

OASIS, 2001. Available from http://www.w3.org/xmlschema-2/

[18] Instruction Manual for Braille Transcribing. American P rin ting House for the

Blind, Louisville, Kentucky, 3rd ed., 1984.

[19] The N em eth Braille C ode for M ath em atics and Science N otation , 1972 Revision.

American P rin ting House for the Blind, Louisville, Kentucky, 1985.

[20] T. Bray, J. Paoli, C. M. Sperberg-M cQueen, E. Maler, F. Yergeau, J. Cowan:

Extensible M arkup Language (XML) 1.1. Technical report, W3C, 2004. Avail

able from http://www.w3.org/TR/2004/REC-xmlll-20040204/


http://www.w3.org/TR/2003/REC-MathML2-20031021/

http://www.w3.org/xmlschema-2/

http://www.w3.org/TR/2004/REC-xmlll-20040204/

133

[21] M. Bryan: A T^X User's Guide to ISO ’s Document Style Semantics and Spec

ification Language (DSSSL). T U G boat 14 (1993), 223-226.

[22] H. Bunt: Issues in M ultim odal H um an-Com puter Comm unication. In H. Bunt,

R .-J.Beun, T. Borghuis (editors): M ultim odal H um an-C om puter Com m unica

tion: System s, Techniques, and Experim ents, 1374. Lecture N otes in C om puter

Science, 1-12, Springer-Verlag, Berlin, January 1998.

[23] S. Buswell, 0 . C apro tti, D. P. Carlisle, M. C. Dewar, M. Gae

tano, M. Kohlhase: The O penM ath S tandard (version 2.0).

Technical report, The O penM ath Society, 2004. Available from

http://www.openmath.org/cocoon/openmath/standard/om20/index.html

[24] S. Buswell, S. D evitt, A. Diaz, P. Ion, R. Miner, N. Poppelier, B. Smith,

N. SoifFer, R. Sutor, S. W att: M athem atical M arkup Language w3c, P ro

posed Recom m endation. Technical report, W 3C HTML, 1998. Available from

http://www.w3.org/TR/1998/REC-MathML-19980407/.

[25] 0 . C apro tti, D. P. Carlisle, A. M. Cohen: The O penM ath S tandard.

Technical report, The O penM ath Esprit Consortium , 2000. Available from

http://www.nag.co.uk/proj ects/OpenMath/omstd

[26] P. Chandler, J. Sweller. Cognitive load theory and the form at of instruction.

Cognition and Instruction 8(4) (1991), 293-332.

[27] J. Clark: The design of RELAX NG. Technical report, OASIS, 2001. Available

from http: //www. thaiopensource. com/relaxng/design. html

[28] J. C lark, M. Makoto: RELAX NG Specifica

tion. Technical report, OASIS, 2001. Available fromhttp://www.oasis-open.org/committees/relax-ng/spec.html

[29] J. Clark, M. Makoto: RELAX NG Tuto

rial. Technical report, OASIS, 2001. Available from

http://www.oasis-open.org/committees/relax-ng/tutorial.html

[30] R. E. C lark (editor): Learning From Media: Argum ents, A nalysis and Evidence.

P erspectives in Instructional Technology and D istance Learning. Inform ation Age Publishing, 2001.


http://www.openmath.org/cocoon/openmath/standard/om20/index.html

http://www.w3.org/TR/1998/REC-MathML-19980407/

http://www.nag.co.uk/proj

http://www.oasis-open.org/committees/relax-ng/spec.html

http://www.oasis-open.org/committees/relax-ng/tutorial.html

1 3 4

[31] R. E. Clark: N ew Directions: C ognitive and M otivational Research Issues.

ch. 15. In Perspectives in Instructional Technology and D istance Learning [30], 2001 .

[32] E. F. Codd: A Relational Model of D ata for Large Shared D ata Banks. C om

m unications o f the A C M 13(6) (June 1970), 377-387.

[33] P. R. Cohen, D. R. McGee: Tangible M ultim odal Interfaces for Safety-Critical

Applications. Com m unications o f the A C M 47(1) (January 2004), 41-46.

[34] J. H. Coombs, A. H. Renear, S. J. DeRose: M arkup Systems and the Future of Scholarly Text Processing. Com m unications o f the A C M 30(11) (1987), 933-

947.

[35] J. Coutaz, L. Nigay, D. Salber: M ultim odality from the User and System Per

spectives. In Proc. ER C IM (European Research Consortium for Inform atics and

M athem atics), workhop on User Interface For All, Heraklion. 1995. Available

from citeseer.ist.psu.edu/coutaz95multimodality.html

[36] J. de Carvalho, H. Jiirgensen: Dynamic M ulti-Purpose M athem atics N otation.

Technical R eport 521, The University of W estern O ntario, 1998.

[37] M. Dewar: O penM ath: An Overview. SIG SA M B ulletin 34(2) (June 2000), 2-5.

[38] C. Dirckx: A M athem atical Text to Braille Translator. 1992. P roject Disser

ta tion , Churchill College, University of Bradford.

[39] A. Dix, J. Finlay, G. Abowd, R. Beale: H um an-C om puter Interaction. Prentice-

Hall, 1998.

[40] M. B. Dorf, E. R. Scharrv: Instruction Manual for Braille Transcribing. Division

for the Blind and Physically H andicapped, Library of Congress, W ashington, D. C., 1979.

[41] S. Dunne, H. Jiirgensen: Form atting Specialized Notations. In Proceedings

o f W O O D M A N ’89: W orkshop on O bject-O rien ted D ocum ent M anipulation.

Rennes, France, 1989.


1 3 5

[42] A. D. Edwards, R. D. Stevens: Une Interface M ultim odale pour l'Access aux

Formules M athem atiques par des Eleves ou E tudiants Aveugles. In Comme les

A utres: Interfaces M ultim odales pou r H andicapes Visuels, Special num ber 1.

97-104. INSERM, 1995.

[43] R. Elm asri, S. B. Navathe: Fundam entals o f D atabase System s. The Ben

jam in/C um m ings Publishing Company, Inc, Redwood City, California, second ed., 1994.

[44] M. G. Eram ian: Displaying DVI Files in Braille: A Viewer for the Visually

Im paired. Technical Report 500, The University of W estern Ontario, 1997.

[45] W . M. Farmer: MKM: A New Interdisciplinary Field of Research. SIG SA M

Bull. 38(2) (2004), 47-52.

[46] R. Furuta, V. Quint, J. Andre: Interactively Editing S tructured Documents.

Electronic Publish ing 1(1) (1988), 19-44.

[47] R. Furuta, J. Scofield, A. Shaw: Document Form atting Systems: Survey, Con

cepts and Issues. ACM C om puting Surveys 14(3) (1982), 417-472.

[48] C. Ghezzi, M. Jazayeri, D. M andrioli: Fundam entals o f Software Engineering.

Prentice-Hall, 1991.

[49] C. F. Goldfarb: A Generalized Approach to Document M arkup. SIG P L A N

N otices 16(6) (1981), 68-73.

[50] M. Goossens, J. Saarela: A P ractical Introduction to SGML. T U G bou t 16(2)

(1995), 103-145.

[51] M. Goossens, J. Saarela: From DT£X to HTML and back. T U G bou t 16(2) (1995), 174-214.

[52] D. Harel: S tatecharts: A Visual Formalism for Complex Systems. Science o f

C om puter Program m ing 8(3) (1987), 231 -274.

[53] D. Harel, A. N aam ad: The STATEM ATE semantics of statecharts. ACM Transactions o f Software Engineering and M ethodology 5(4) (1996), 293 -333.

[54] F. C. Heeman: G ranularity in S tructured Documents. E lectronic Publishing

5(3) (1992), 143-155.


136

[55] J. E. Hopcroft, J. D. Ullman: Introduction to A u tom ata Theory , Languages.

and C om putation . Addison-Wesley, first ed., 1979.

[56] E. L. Hutchins, J. D. Hollan, D. A. Norman: D irect M anipulation Interfaces.

87-124. In Norm an and D raper [77], 1986.

[57] Inform ation Processing - Text and Office System s - Standard Generalized

M arkup Language (SGM L). In ternational O rganization for S tandardization,

International S tandard 8879, 1986.

[58] T. M. V. Janssen: Compositionality. In J. van Benthem, A. te r Meulen (editors):

H andbook o f Logic and Language. Elsevier Science Publishers, 1997.

[59] H. Jiirgensen: Tactile C om puter Graphics. 1997. M anuscript.

[60] H. Jiirgensen, H. W aldschmidt: Do Portability, Verifiability, and Simplicity

of Program m ing have to be Conflicting Goals? Technical R eport 123, The

University of W estern O ntario, 1984.

[61] B. W. K ernighan, D. M. Ritchie: The C Program m ing Language. Prentice-Hall,

Englewood Cliffs, New Jersey, 1978.

[62] P. Kilpelainen: SGML k XML content models. Technical R eport C-1998-12,

University of Helsinki, 1998.

[63] D. E. K nuth: The Genesis of A ttribu te G ram m ars. In Proceedings o f the

International Conference on A ttr ib u ted G ram m ars and their A pplications. 1-

12. Springer-Verlag New York, Inc, 1990.

[64] D. E. K nuth: The TfcXbook. Addison-Wesley, Reading, M assachusetts, 1993.

[65] L. Lam port: BTfcX, a D ocum ent Preparation System . Addison-Wesley, R eading, M assachusetts, 1986.

[66] J. R. Levine, T. Mason, D. Brown: lex & yacc. O ’Reilly k Associates, Inc,

Sebastopol, California, second ed., 1995.

[67] D. M. Levy: Fixed or Fluid? Document Stablility and New Media. E C H T

1994 Proceedings (Septem ber 1994), 24-31.


1 3 7

[68] X. Li: XML and the Com m unication of M athem atical O bjects. M aster's thesis.

The University of W estern O ntario, London, Canada, 1999.

[69] J. C. M artin: Introduction to Languages and The Theory o f C om putation .

McGraw-Hill, first ed., 1991.

[70] M ath T ype, M athem atical Equation E d itor , User Manual. Design Science, Inc., Long Beach, California, May 1997.

[71] H. A. M aurer, A. Salomaa, D. Wood: A supernorm al-form theorem for context-

free gram m ars. JA C M 30(1) (January 1983), 95-102.

[72] B. Meyer: O bjec t O riented Software Construction. Addison-Wesley, 1997.

[73] E. D. M ynatt, G. Weber: Nonvisual Presentation of G raphical User Interfaces:

C ontrasting Two Approaches. In CHI 1994 Conference Proceedings. 166-172, April 1994.

[74] W. M. Newman, M. G. Lamming: In teractive System Design. Addison-Wesley,

1995.

[75] L. Nigay, F. Jam bon, J. Coutaz: Formal Specification of M ultim odality. In

C H I’95 W orkshop on Formal Specifications o f User Interfaces. Denver, USA,

1995. Available from c i te s e e r . i s t .p s u .e d u /n ig a y 9 5 f o r m a l .h tm l

[76] D. A. Norman: C ognitive Engineering, 31-61. In Norman and D raper [77].

1986.

[77] D. A. N orm an, S. W. D raper (editors): User Centered System Design. Lawrence

Erlbaum Associates, Publishers, 1986.

[78] J. Paakki: A ttribu te G ram m ar Paradigm s - A High-Level M ethodology in

Language Im plem entation. A C M C om puting Surveys 27(2) (1995), 196-255.

[79] L. Padovani: On the Roles of D l^X and M athM L in Encoding and Processing

M athem atical Expressions. In A sperti et al. [13], 66-79.

[80] H. Petrie, W. Fisher, G. W eber, I. Langer, K. G. andC athy Rundle, L. Pyfers:

Universal Interfaces to M ultim edia. In 4th IEEE International Conference on

M u ltim odal Interfaces (ICM I 2002). IEEE Com puter Society, O ctober 2002.


138

[81] N. A. F. M. Poppelier, E. van Herwijnen, C. A. Rowley: S tandard D TD 's and

Scientific Publishing. E PSIG N ew s 5 (Septem ber 1992), 10-19.

[82] L. M. Quiroga, M. E. Crosby, M. K. Iding: Reducing Cognitive Load. In H ICSS

’04: Proceedings o f the 37th Annual Hawaii International Conference on System

Sciences (H ICSS’04) - Track 5. 50131.1. IEEE Com puter Society, 2004.

[83] T. V. Ram an: T^X talk. T U G boat 12 (1991), 178.

[84] T. V. Ram an: An Audio View of D I^X Documents. T U G boat 13 (1992),

372-379.

[85] T. V. Ram an: Docum ents Are not ju s t for Printing. In Proc. Principles o f

D ocum ent Processing. 1992.

[86] T. V. Ram an: A udio System for Technical Readings. PhD thesis, Cornell

University, New York, USA, 1994.

[87] T. V. Ram an: An Audio View of DTfrjX Documents - P art II. T U G boat 16

(1995), 311-314.

[88] T. V. Ram an: Emacspeak: A Speech-Enabling Interface. Dr. D o b b ’s Journal

(Septem ber 1997).

[89] D. R. Raym ond, F. W. Tom pa, D. Wood: M arkup Reconsidered. In First

International W orkshop on Principles o f D ocum ent Processing. W ashington,

D.C., O ctober 21-23 1992.

[90] D. R. Raym ond, F. W. Tompa, D. Wood: From D ata Representation to D ata

Model: M eta-Sem antic Issues in the Evolution of SGML. C om puter S tandards

and Interfaces (1996).

[91] L. M. Reeves, J.-C . M artin, J. Lai, M. McTear, T. Ram an, K. M. Stanney, H. Su,

Q. Y. Wang, J. A. Larson, S. O viatt, T. Balaji, S. Buisine, P. Codings, P. Cohen,

B. K raal: Guidelines for M ultim odal User Interface Design. Com m unications

o f the A C M 47(1) (January 2004), 57-59.

[92] C. Roisin, I. Vatton: Merging Logical and Physical S tructures. Electronic

Publish ing 6(4) (1993), 327-337.


13 9

[93] W. Rudin: R eal and C om plex Analysis. McGraw-Hill, New York, New York,

th ird ed., 1987.

[94] G. Salomon: Television is "easy" and print is v tough5’: The differential invest

ment of m ental effort in learning as a fucntion of perceptions and attributions.

Journal o f Educational P sychology 76(4) (1984), 233 -243.

[95] R. Sethi: Program m ing Languages C oncepts and Constructs. Addison-Wesley, 1990.

[96] G. G. Sm ith, D. Ferguson: D iagram s and M ath N otation in e-Learning: Grow

ing Pains of a New G eneration. International Journal o f M athem atical Educa

tion in Science and Technology 35(5) (2004), 681-695.

[97] C. M. Sperberg-M cQueen: Specifying Document Structure: Differences in

DT^X and T E I M arkup. T U G boat 12(3) (1991), 415-421.

[98] A. Strotm ann: C ontent M arkup Languages Design Principles. PhD thesis, The

Florida S tate University, Florida, USA, 2003.

[99] J. Sweller, P. Chandler: W hy some m aterial is difficul to learn. Cognition and

Instruction 12(3) (1994), 185-233.

[100] J.-P. Tremblay, P. G. Sorenson: The Theory and Practice o f C om piler Writing. McGraw-Hill, 1989.

[101] J. van Benthem, A. te r Meulen: H andbook o f Logic and Language. Elsevier Science Publishers, 1997.

[102] S. Vorkoetter: Proposed O penM ath Specification. Technical report, W aterloo

Maple Software, 1995. Available from http://www.openmath.org/.

[103] J. N. Wallace, T. A. B. Wesley: The Access to Scientific and M athem atical Inform ation for Blind People. [1991]. M anuscript, D epartm ent of Com puting, University of Bradford.

[104] D. A. W att, D. F. Brown: Program m ing Language Processors in Java. Prentice- Hall, Harlow Essex, first ed., 2000.

[105] G. Weber. A M ultim edia E dito r for M athem atical Documents. Available from http: //www .multireader. org/multimedia'/.20editor. html



140

[106] D. Wood: G ram m ar and L forms: an introduction. Lecture N otes in C om puter

Science 91. Springer-Verlag, 1980.

[107] D. Wood: Theory o f C om putation . John Wiley k Sons, first ed., 1987.

[108] F. J. W right: Interactive M athem atics via the Web using M athM L. SIG SA M

B ulletin 34(2) (June 2000), 49-57.


141

Name:

Place of birth:

Education:

Awards:

R elated Work Experience:

VITA

Jackson W. Marques de Carvalho

Brazil

The University of W estern O ntario London, O ntario, C anada 1995-2005 Ph.D .

University of Maine a t OronoOrono, Maine, USA1983-1985 M aster of Electrical Eng.

Universidade Federal do Rio Grande do Norte N atal, RN, Brazil 1972-1978 B.Sc

Conselho Nacional de Desenvolvimento Cientifico e Tecnologico (CNPq)1995-1998

O rganization of American States 1983-1985

LecturerD epartm ent of C om puter Science University of P ittsburgh P ittsburgh , PA, USA 2002-present

LecturerSchool of Com puter Science University of W indsor W indsor, O ntario, C anada 1999-2002

G raduate Research A ssistant/L ecturer D epartm ent of Com puter Science The University of W estern O ntario London, O ntario, C anada 1999


1 4 2

R elated WorkExperience:(cont) Teaching Assistant

Faculty of Inform ation and Media Studies The University of W estern O ntario London, O ntario, C anada 1998

LecturerD epartm ent of C om puter Science The University of W estern O ntario London, O ntario, C anada 1997

Teaching Assistant D epartm ent of C om puter Science The University of W estern O ntario London, O ntario, Canada1996-1998

C oordinator of the Scientific Com puting Center (NCC)D epartm ent of C om puter Science Universidade Federal do Rio G rande do Norte N atal, RN, Brazil 1991-1995

LecturerD epartm ent of C om puter Science Universidade Federal do Rio G rande do N orte N atal, RN, Brazil 1989-1995

G raduate A ssistantD epartm ent of Electrical EngineeringUniversity of Maine a t OronoOrono, Maine, USA1985

Electrical EngineerTechnological Nucleus a t Center of Technology Universidade Federal do Rio G rande do Norte N atal, RN, Brazil 1986-1989


143

Presentations:

Technical Reports:

1987 MTNS, Phoenix Az, USA Straight Line Motion,Inverse K inem atic Velocities and,Inverse Trajectory Planning

1987 MTNS, Phoenix Az, USAMASK Layout Language and Layout Checking Plots

D ynam ic M ulti-Purpose M athem atics N otation Technical R eport N um ber 521, 1998 In conjuction with Dr. Helmut Jiirgensen D epartm ent of C om puter Science The University of W estern O ntario London, O ntario, Canada