maintenance of object-oriented systems during structural evolution

28
Maintenance of Object-Oriented Systems During Structural Evolution Paul L. Bergstein Department of Computer Science and Engineering, Wright State University, Dayton, OH 45435, USA E-mail: [email protected] We have previously developed a mathematical treat- ment of a calculus for class transformations that preserve or extend a set of objects. Methods for au- tomating the maintenance of structural and behavioral consistency in systems based on evolving class struc- tures have been provided for the object-preserving and object-extending transformations. This work extends the calculus of class transformations to include certain transformations that reflect not only the extension and reclassification of existing objects, but also structural changes (other than addition of attributes) in the origi- nal objects. Language-preserving transformations are a special case of transformations that change the structure of ex- isting objects. If an object schema is decorated with concrete syntax, it defines not only a class struc- ture, but also a language for describing the objects. When two schemas define the same language but dif- ferent classes, the language may be used to guide the transportation of functionality between domains. The language-preserving transformations defined here form the basis of a complete transformation system for a subset of class graphs powerful enough to express the regular languages. c 1997 John Wiley & Sons, Inc. 1. Introduction Class organizations (schemas) evolve over the life cycle of object-oriented systems for a variety of rea- sons. This issue has recently been a subject of increas- ing attention in the literature of both object-oriented lan- guages and especially object-oriented database systems: [22, 19, 29, 8, 5, 10, 11, 12, 4, 21, 1, 3, 30, 34]. One of the most common forms of evolution involves the extension of an existing schema by addition of new classes of objects or the addition of attributes to the orig- inal objects. Sometimes class structures are reorganized even when the set of objects is unchanged. In this case the reorganization might represent an optimization of the system, or just a change in the users’ perspective. At the Received September 1995; revised October 1996; accepted May 1997 Recommending editor: Roberto Zicari c 1997 John Wiley & Sons, Inc. other extreme, a class reorganization might reflect not only the extension and reclassification of existing ob- jects, but also structural changes (other than addition of attributes) in the original objects. While extension and reclassification of objects are mainly concerned with the organization of objects into classes, object restructuring is concerned primarily with the organization of attributes, or “parts,” into objects. Here, a major concern is how to modify the code of an object-oriented program if the class definitions are changed so that the same data is organized into a dif- ferent object structure. If the new objects hold the same data as the original objects, the class structures can be considered in some way analogous. The problem is to find a mapping of the code (methods) from the old class structure to the new one. Some of the issues related to maintaining behavioral consistency during object restructuring have been previ- ously addressed by Johnson and Opdyke [19] and others. In this paper, we consider a novel framework for deal- ing with object restructuring with a theoretical basis in formal languages. We view the original and restructured class hierarchies as structurally different grammars that define the same language. The programming language model used throughout the remainder of this paper is formally defined in Sec- tion 4. In the model, a program comprises a set of class definitions in the form of a class graph plus a set of method definitions. The data model, formally defined in Appendix A, is an extension of the Demeter Ker- nel Model [25] which allows decoration of class graphs (schemas) with concrete syntax. The extended data model uses a graphical notation to define both a set of objects and a language for represent- ing the objects textually. In a class graph squares rep- resent concrete classes and hexagons represent abstract classes. Inheritance (subclass) relationships are indicated by wide arrows. Thin arrows are used to specify the at- tributes and concrete syntax, collectively referred to as “parts,” associated with a class. The parts of a class (in- cluding inherited parts) are totally ordered. The textual THEORY AND PRACTICE OF OBJECT SYSTEMS, Vol. 3(3), 185–212 1997 CCC 1074–3227/97/030185–28

Upload: paul-l

Post on 06-Jun-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Maintenance of Object-Oriented SystemsDuring Structural Evolution

Paul L. BergsteinDepartment of Computer Science and Engineering, Wright State University, Dayton, OH 45435, USAE-mail: [email protected]

We have previously developed a mathematical treat-ment of a calculus for class transformations thatpreserve or extend a set of objects. Methods for au-tomating the maintenance of structural and behavioralconsistency in systems based on evolving class struc-tures have been provided for the object-preserving andobject-extending transformations. This work extendsthe calculus of class transformations to include certaintransformations that reflect not only the extension andreclassification of existing objects, but also structuralchanges (other than addition of attributes) in the origi-nal objects.

Language-preserving transformations are a specialcase of transformations that change the structure of ex-isting objects. If an object schema is decorated withconcrete syntax, it defines not only a class struc-ture, but also a language for describing the objects.When two schemas define the same language but dif-ferent classes, the language may be used to guide thetransportation of functionality between domains. Thelanguage-preserving transformations defined here formthe basis of a complete transformation system for asubset of class graphs powerful enough to express theregular languages. c© 1997 John Wiley & Sons, Inc.

1. Introduction

Class organizations (schemas) evolve over the lifecycle of object-oriented systems for a variety of rea-sons. This issue has recently been a subject of increas-ing attention in the literature of both object-oriented lan-guages and especially object-oriented database systems:[22, 19, 29, 8, 5, 10, 11, 12, 4, 21, 1, 3, 30, 34].

One of the most common forms of evolution involvesthe extension of an existing schema by addition of newclasses of objects or the addition of attributes to the orig-inal objects. Sometimes class structures are reorganizedeven when the set of objects is unchanged. In this casethe reorganization might represent an optimization of thesystem, or just a change in the users’ perspective. At the

Received September 1995; revised October 1996; accepted May 1997Recommending editor: Roberto Zicari

c© 1997 John Wiley & Sons, Inc.

other extreme, a class reorganization might reflect notonly the extension and reclassification of existing ob-jects, but also structural changes (other than addition ofattributes) in the original objects.

While extension and reclassification of objects aremainly concerned with the organization of objects intoclasses, object restructuring is concerned primarily withthe organization of attributes, or “parts,” into objects.Here, a major concern is how to modify the code ofan object-oriented program if the class definitions arechanged so that the same data is organized into a dif-ferent object structure. If the new objects hold the samedata as the original objects, the class structures can beconsidered in some way analogous. The problem is tofind a mapping of the code (methods) from the old classstructure to the new one.

Some of the issues related to maintaining behavioralconsistency during object restructuring have been previ-ously addressed by Johnson and Opdyke [19] and others.In this paper, we consider a novel framework for deal-ing with object restructuring with a theoretical basis informal languages. We view the original and restructuredclass hierarchies as structurally different grammars thatdefine the same language.

The programming language model used throughoutthe remainder of this paper is formally defined in Sec-tion 4. In the model, a program comprises a set of classdefinitions in the form of a class graph plus a set ofmethod definitions. The data model, formally definedin Appendix A, is an extension of the Demeter Ker-nel Model [25] which allows decoration of class graphs(schemas) with concrete syntax.

The extended data model uses a graphical notation todefine both a set of objects and a language for represent-ing the objects textually. In a class graph squares rep-resent concrete classes and hexagons represent abstractclasses. Inheritance (subclass) relationships are indicatedby wide arrows. Thin arrows are used to specify the at-tributes and concrete syntax, collectively referred to as“parts,” associated with a class. The parts of a class (in-cluding inherited parts) are totally ordered. The textual

THEORY AND PRACTICE OF OBJECT SYSTEMS, Vol. 3(3), 185–212 1997 CCC 1074–3227/97/030185–28

representation of an object is obtained by concatenationof the concrete syntax as it is encountered during a depthfirst traversal of the object’s parts.

An interesting class of transformations investigated inSection 5 are those that change the structure of the ob-jects, but preserve the language defined by the schema.Since a class graph defines both a class structure and agrammar, any change in the class structure is reflectedin a corresponding change in the grammar. There is aninteresting class of transformations that result in a newclass structure, a potentially new set of objects, and anew grammar, but which leave the language defined bythe grammar unchanged. I call such a transformationlanguage-preserving and say that the old and new classgraphs are language-equivalent.

Example 1. The class graphs in Figure 1 are language-equivalent even though they define different sets of ob-jects. Grammars corresponding to the class graphs φ1

and φ2, respectively, are given below in EBNF form:

<Prefix> ::= <Number> | <Compound>

<Number> ::= Digit {Digit}

<Compound> ::= <AddExp> | <MulExp>

<AddExp> ::= ’(’ ’+’ <Prefix> <Prefix> ’)’

<MulExp> ::= ’(’ ’*’ <Prefix> <Prefix> ’)’

<Prefix> ::= <Number> | <Compound>

<Number> ::= Digit {Digit}

<Compound> ::= ’(’ <Op> <Prefix> <Prefix> ’)’

<Op> ::= <AddOp> | <MulOp>

<AddOp> ::= ’+’

<MulOp> ::= ’*’

If the concrete syntax specified in a schema is mean-ingful and a transformation preserves the defined lan-guage, it is reasonable to hypothesize that the objectsare intended to represent the same data in the trans-formed schema as in the original. If the inputs and out-puts of a program are objects, then it is reasonable toexpect that the code could be automatically updated af-ter a language-preserving transformation so that any in-put in the language will produce output identical to theoutput from the original program1. In other words, wecan expect to find a mapping of the methods from theold class structure to the new one which will preservethe behavior of the system.

These techniques may be useful in contexts other thanevolution, e.g., during implementation when it is desir-able to reuse some of the functionality of an old appli-cation in a new environment.

2. Motivating Example

Consider the first class graph, φ1, in Figure 1. Sincethe Prefix class is abstract, every Prefix object must bean instance of a concrete subclass: Number, AddExp,or MulExp. Number objects are represented textually

by strings of digits. AddExp and MulExp objects arerepresented textually by strings comprising an openingparenthesis, a “+” or “*”, the textual representations oftheir two subexpression arguments, and a closing paren-thesis. They may be nested to any arbitrary depth.

In the second class graph, φ2, the Compound classhas been made concrete. We no longer have subclassesof Compound to distinguish between multiplication andaddition. Instead, we use an attribute, op, which takesas its value an instance of an AddOp or MulOp. Notice,however, that the textual representations of Compound

objects has not changed.Suppose we start with φ1 and implement a program

to evaluate prefix expressions in an object-oriented lan-guage such as C++. The class definitions can be auto-matically generated from the class graph. More interest-ingly, code to parse the program’s input and build thecorresponding Prefix object can also be automaticallygenerated. The application is completed by adding a fewsimple methods:

int Number::eval()

{ return value; }

int AddExp::eval()

{ return (arg1->eval() + arg2->eval()); }

int MulExp::eval()

{ return (arg1->eval() * arg2->eval()); }

Now, suppose we wish to change the class structure tothat of the second class graph, φ2. We start by rerunningthe code generator to get a new set of class definitionsand new code to parse and build Prefix objects.

When parsing a Compound in the original program,an instance of either AddExp or MulExp is created, de-pending on whether a “+” or “*” is found in the inputstream. When a Compound is parsed in the new pro-gram, an instance of the concrete Compound class iscreated. This involves parsing the op part of the Com-

pound to produce either an AddOp or MulOp depend-ing on whether a “+” or “*” is found in the input stream.Wherever an object had an AddExp or MulExp asa part in the original program, the corresponding ob-ject will have a Compound with either an AddOp orMulOp, respectively, in the transformed program.

In order to maintain functional equivalence, it is nec-essary for a Compound object in the transformed pro-gram to pass along any message it receives to its op partwith the extra argument this. Methods written for theoriginal program are mapped from AddExp → AddOp

and from MulExp → MulOp. The only modification tothe methods is that they must access any parts definedfor Compound objects indirectly2 through the extra ar-gument:

186 THEORY AND PRACTICE OF OBJECT SYSTEMS–1997

int Number::eval()

{ return value; }

int Compound::eval()

{ return op->eval(this); }

int AddOp::eval(Compound* exp)

{ return (exp->get_arg1()->eval() +

exp->get_arg2()->eval()); }

int MulOp::eval(Compound* exp)

{ return (exp->get_arg1()->eval() *

exp->get_arg2()->eval()); }

The subclass to attribute transformation and its in-verse are just two of many common class transforma-tions that change the structure, but not the textual repre-sentation or information content, of objects. The remain-der of this paper is dedicated to developing methods forautomatically restoring the behavior of a system aftersuch class transformations.

3. Research Approach

The approach taken is to break the problem downinto three manageable sub-problems:

1. Defining a set of primitive transformations. A smallset of primitive transformations is defined whichcan be composed sequentially to form many usefullanguage-preserving transformations. By useful, wemean those transformations that would make sense

from a software design point of view. The primitivesdefined in Section 5 allow transformations including:

• Any transformation that preserves the originalobjects

• Renaming of classes and attributes• Addition and deletion of useless symbols• Distribution of parts up or down the part-of

hierarchy• Replacing subclasses with attributes or at-

tributes with subclasses

2. Providing algorithms for incrementally updatingthe code. For each primitive transformation an al-gorithm must be found for updating the code. Then,given any sequence of primitive transformations, thecode can be updated incrementally by performing up-dates for each of the primitives in sequence.

3. Reducing an arbitrary language-preserving trans-formation to a sequence of primitives. An algorithmto search for a sequence of primitive transforma-tions between two arbitrary language-equivalent classgraphs must be found. The search may be effectivelyguided by the concrete syntax of the language. Whenthe search is successful, we may regard the resultingsequence of primitives as the definition of an analogybetween the class graphs.

4. Language Model

This section describes a simple object-oriented pro-gramming language based on class graphs. The CG lan-guage will be used to illustrate the method transforma-

φ1 φ2

FIG. 1. Language-equivalent class graphs.

THEORY AND PRACTICE OF OBJECT SYSTEMS–1997 187

Main : main ()

{ exp <- eval() <- print() }

Number : eval ()

{ self }

AddExp : eval ()

{

value <- assign(arg1 <- eval());

value <- add(arg2 <- eval())

}

MulExp : eval ()

{

value <- assign(arg1 <- eval())

<- mul(arg2 <- eval())

}

FIG. 2. Program A.

tions that are required to restore behavioral consistencyafter a language-preserving class graph transformation.Although the CG language is very simple, the same prin-ciples can be used to modify programs written in “real”languages.

4.1. Overview

A CG (class graph) program consists of a set ofclass definitions in the form of a class graph plus a setof method definitions. There is one built-in class calledNumber for which the language provides the built-inmethods add, sub, mul, div, assign, and print.The user may define still more methods for the Number

class.The class definitions must include a concrete class

called Main, and the method definitions must providea main method for the Main class. Program execution

begins by parsing the input by recursive descent to con-struct an instance of the Main class (the main object),and invoking its main method. Note that programs writ-ten in the CG language are self documenting as to theirlegal inputs since they define their own input language.

In the CG language, there is no way to create or de-stroy objects once the initial parsing operation is com-plete. Thus, the set of objects is fixed during programexecution and consists of a tree rooted at the main ob-ject.

Each object except the main object has an implied“container” attribute. An object’s container is its parentin the object tree. In other words, the attribute links be-tween objects are defined to be bidirectional. This featureof the CG language is included to simplify some of thecode transformations discussed below. In a “real” lan-guage, the container attributes3 would be implementedonly when required by a code transformation.

In the CG language, as in languages such as Smalltalk,each object has direct access only to its own attributes.However, in CG these attributes include the containerattribute. In other words, each object has access to itsown parts and to the object of which it is a part.

4.2. Methods, Messages, and Expressions

Each method may take any number of objects as ar-guments and every method returns an object. Both thearguments and the return value are passed by reference.This must be the case, since passing by value would in-volve the construction of new objects during programexecution.

method ::= class : name ’(’ formals ’)’ ’{’ explist ’}’formals ::= name { , name }This construct is a method definition. When the methodis invoked by sending the name message to an objectof class class, the expression explist is evaluated afterargument values are substituted for the formals and itsvalue is returned.

explist ::= exp { ; exp }An expression list is evaluated by evaluating each ex-pression in the list from left to right. The value of thelist is the value of the last expression.

exp ::= nameHere, name may be either the name of a part of the objectfor which the method being evaluated was invoked or thename of a formal parameter of the method. The value ofthe expression is the object instantiating the part or theobject passed as the actual argument, respectively.

exp ::= selfThe value of the expression self is the object for whichthe method being evaluated was invoked.

exp ::= containerThe value of the expression container is the object which

188 THEORY AND PRACTICE OF OBJECT SYSTEMS–1997

FIG. 3. Nesting of parts.

contains the object for which the method being evaluatedwas invoked. The expression container must not appearin any method attached to the Main class.

exp ::= exp ← name ’(’ [actuals] ’)’actuals ::= exp { , exp }This construct denotes the sending of a message. Whenan object is sent a message, the object’s method calledname is invoked. If the object has no method with theproper name, a run-time exception occurs and programexecution is terminated. The evaluation order is: The expon the left-hand side of the message send operator, ←,is evaluated; each of the actual argument expressions isevaluated and their values are substituted for the corre-sponding formals in the method body; the method bodyis evaluated and the result is returned.

Note that the CG language supports delayed bindingbut not inheritance of methods. Inheritance is not an im-portant issue in the study of code transformations sinceit can easily be eliminated from an object-oriented pro-gram just by copying methods from superclasses to theclasses where they are inherited. The inheritance mech-anism is merely a convenience for the programmer sothat each method only needs to be written in one place.

4.3. Built-Ins

All of the built-in methods for the Number class ex-cept print take a single argument which must be an-other Number. All of the methods return self. A sideeffect of the methods add, sub, mul, div, and assignis that the “value” of the Number object receiving themessage is changed. That is, the state of the object ischanged in such a way that subsequent messages to theobject may have different results. The value is modifiedin the obvious way depending on whether the messageis add, sub, mul, div, or assign. When a Numberis created during the initial parsing, its value is initial-ized depending on the value denoted by the token parsed.When a Number receives the print message, a tokendenoting its value is output.

Example 2. A complete CG program to evaluate arith-metic prefix expressions is shown in Figure 2.

5. Language-Preserving Class Transformations

5.1. Primitive Language-Preserving Transformations

The following primitives comprise the language-

FIG. 4. Addition of lambda alternative.

THEORY AND PRACTICE OF OBJECT SYSTEMS–1997 189

preserving class graph transformations, and are de-scribed informally below. Formal definitions can befound in Appendix B.

1. The object-preserving transformations2. Renaming of vertices and edges3. Nesting of parts4. Unnesting of parts5. Addition of lambda parts6. Deletion of lambda part7. Addition of lambda alternative8. Deletion of lambda alternative9. Insertion of singleton construction

10. Deletion of singleton construction11. Attribute to subclass12. Subclass to attribute

5.1.1. Object-preserving transformations. The ob-ject-preserving transformations for class graphs are al-most the same as the object-preserving transformationsdefined in [5] for graphs lacking concrete syntax andordering of parts. There is an additional primitive thatallows edges to be renumbered as long as the orderingis unchanged. Abstraction of common parts and distri-bution of common parts are extended to apply to syn-tax edges as well as attribute edges. The only additionalcomplexity is that abstraction of common parts to a su-perclass is restricted so that the ordering of parts at eachimmediate subclass cannot be changed. If there is a setof classes that have more than one part in common, butthe common parts are ordered differently in the individ-ual classes, then it is not possible to abstract all of thecommon parts.

The object-preserving transformations for classgraphs are as follows:

• Renumbering of parts. Any set of attribute and syn-tax edges in a class graph may be renumbered as longas the ordering of parts (including inherited parts) re-mains unchanged for each class.

• Abstraction of common parts. If all of the immediatesubclasses of class C have the same part, that part canbe moved up the inheritance hierarchy so that each ofthe subclasses will inherit the part from C, rather thanduplicating the part in each subclass.

• Distribution of common parts. This is the inverse ofabstraction of common parts.

• Deletion of “useless” alternation. A vertex represent-ing an abstract class (formally, an alternation vertex)is “useless” if it has no incoming edges and no parts.Intuitively, an abstract class is useless if it is not apart of any concrete class, and it has no parts for anyconcrete class to inherit. If an abstract class is uselessit may be deleted along with its outgoing inheritanceedges.

• Addition of “useless” alternation. An abstract classcan be added along with outgoing inheritance edgesto any set of classes already in the class graph. Thisis the inverse of deletion of useless alternation.

• Part replacement. If two classes, C1 and C2, havethe same set of instantiable (concrete) subclasses thenany attribute edge incoming at C1 may be reroutedto C2, since the set of objects which may instantiatethe attribute will not change. Note that the inverseof part replacement is just another instance of thetransformation.

5.1.2. Renaming of vertices and edges. Classes andattributes may be freely renamed.

5.1.3. Nesting of parts.

• If every class which has a class w as a part, has u asa part immediately after w, then we may remove theu part from all of those classes and instead make uthe last part of class w. See, for example, Figure 3.

• If every class which has w as a part has u as a partimmediately before w, then we may remove the u partfrom all of those classes and instead make u the firstpart of class w.

5.1.4. Unnesting of parts. This is the inverse of nest-ing of parts.

5.1.5. Addition of lambda parts. A part, p, can beadded to any class if p is a class with no parts and nosubclasses, or if p is the “empty string.”

5.1.6. Deletion of lambda parts. This is the inverse ofaddition of lambda parts.

5.1.7. Addition of lambda alternative. If an abstractclass, v, has as its only immediate (not inherited) part animmediate superclass of v, v′, then a concrete class, w,with no parts may be added to the subclasses of v. See,for example, Figure 4.

5.1.8. Deletion of lambda alternative. This is the in-verse of addition of lambda alternative.

5.1.9. Insertion of singleton construction. A newconcrete class, v, with a class, v′, as its only part may beadded to a class graph, and v may replace v′ as a partin any other class. If v′ is a subclass, inheritance edgesmay be rerouted from v′ to v if the change does not re-sult in the inheritance of any additional parts at v. See,for example, Figure 5.

5.1.10. Deletion of singleton construction. This is theinverse of insertion of singleton construction.

190 THEORY AND PRACTICE OF OBJECT SYSTEMS–1997

FIG. 5. Insertion of singleton construction.

5.1.11. Attribute to subclass. If a class graph containsa concrete class, v, with an abstract class, w, as a part,then we may delete the part, w, from v and for eachimmediate subclass, w′, of w we create a new concreteclass, v′ with w′ as a part, and make v′ a subclass of v.Class v becomes abstract. See, for example, Figure 6.

5.1.12. Subclass to attribute. This is the inverse ofattribute to subclass.

5.2. Justification for the Primitive Transformations

One justification for the selection of the chosen prim-itives is that they make it possible to express commonlyoccurring language-preserving transformations as a se-quence of primitives. Examination of the literature andpersonal experience with the evolution of the Demetersystem indicate that the primitive transformations de-fined in this section are powerful enough to express most,if not all, of the transformations that could be consideredlanguage-preserving. Rather than argue the subjective“practical usefulness” of the transformations, however,

we demonstrate that the primitive language-preservingtransformations defined here form the basis of a com-plete transformation system for a subset of class graphspowerful enough to express the regular languages.

5.2.1. Regular class graphs. The regular class graphsare defined so that there is a bijection between them andthe regular expressions. The alphabet of a regular classgraph consists of the terminal classes (those concreteclasses with a single outgoing syntax edge as their onlypart). Vertices representing non-terminal concrete classesare concatenation ( · ) operators and vertices represent-ing abstract classes are union (+) operators. The closure(∗) operator can be expressed using a cycle combiningvertices representing abstract and concrete classes. Forconvenience, we introduce a symbol (see Figure 7) forthe ∗ operator and the symbol, λ, for the terminal classwith the “empty string” as the target of its syntax edge.

Definition. Let VS be the set of all syntax vertices. Thenthe regular class graphs and their bijection with the reg-ular expressions are defined recursively as follows:

FIG. 6. Attribute to subclass.

THEORY AND PRACTICE OF OBJECT SYSTEMS–1997 191

denotes

FIG. 7. Closure operator.

1. for all A ∈ VS

2. for regular subgraphs A and B

3. for regular subgraphs A and B

4. for regular subgraph A

Since there is a bijection between the regular expres-sions and the regular class graphs, we sometimes denotea regular class graph by its corresponding regular ex-pression in the following discussion.

5.2.2. Axiom systems for regular expressions. Itis well known that using substitution as the onlyrule of inference, there is no finite set of equa-tions over the regular expressions that allows the deriva-tion of a regular expression, α, from a regular expressionα′ if and only if α and α′ define the same language. Thatis, there is no finite complete set of axioms in the algebraof regular expressions [32]. However, the following sys-tem,F, due to Salomaa [32] is complete if an additionalrule of inference is allowed:

(α+ (β+ γ)) = ((α+ β) + γ)

(α · (β · γ)) = ((α · β) · γ)

(α+ β) = (β+ α)

(α · (β+ γ)) = ((α · β) + (α · γ))

((α+ β) · γ) = ((α · γ) + (β · γ))

(α+ α) = α

(α · λ) = α

(α · φ) = φ

(α+ φ) = α

(α∗) = (λ + (α · (α∗)))

(α∗) = ((λ + α)∗)

The additional rule of inference is solution of equations:If β does not possess the “empty word property” thenthe equation α = (γ · (β∗)) may be inferred from theequation α = ((α · β) + γ).

Definition. A regular expression, α, (or its correspond-ing regular class graph) has the empty word property(e.w.p.) if and only if one of the following holds:

1. α = λ2. α = (β∗) for any β3. α = (β+ γ) where β or γ has the e.w.p.4. α = (β · γ) where β and γ have the e.w.p.

5.2.3. Completeness proof. We show that for eachequation over the regular expressions which is a sub-stitution instance of an axiom in the complete systemF, there is a sequence of primitive transformations totransform the regular class graph corresponding to theleft-hand side of the equation to the class graph cor-responding to the right-hand side. Since every primitivetransformation has an inverse, it is also possible to trans-form the right-hand side to the left-hand side. In otherwords, if it is possible to substitute a regular expression,

192 THEORY AND PRACTICE OF OBJECT SYSTEMS–1997

FIG. 8. Solution of equations.

α, for a regular expression α′ in F, then it is possi-ble to transform the regular class graph, α′ to α withthe primitive transformations. Any regular expression,α, can be derived from any language-equivalent regularexpression, α′, in F, so it must be possible to trans-form any regular class graph to any language-equivalentregular class graph by a sequence of primitive transfor-mations if the following meta-transformation (Figure 8),corresponding to the extra rule of inference, solution ofequations, is allowed:

• If a regular class graph, φ, contains a subgraph, ((α ·β)+γ), which was obtained by a sequence of primitivetransformations from α, and β does not possess theempty word property, then ((α·β)+γ) may be replacedwith the subgraph (γ · (β∗)).

The meta-transformation may prove difficult to apply inpractice and is not considered one of the primitives. Itis included here only to demonstrate that the primitivetransformations themselves form a system as completeas any known system for regular expressions.

Appendix C gives sequences of primitive transforma-tions that correspond to each of the equations in the ax-iom system, F, except for the equations (α ·φ) = φ and(α + φ) = α which are not applicable since the regularclass graphs as defined here do not contain the emptylanguage, φ.

6. CG Program Transformations

In this section, code update rules for programs writ-ten in the CG language are given for each of the primitivelanguage-preserving transformations. The rules are rel-atively straight forward since CG is untyped, and sincethe objects in a CG program always form a tree at runtime. Some of the code transformations require that anobject have access to its “container” (parent) object andmay involve adding code to the container class. For ex-ample, when a part is unnested (moved up the part-ofhierarchy) instances of the class which originally had thepart must access it indirectly through their containers. InCG there is a built-in method called container thatprovides the required access. In “real” languages, an in-stance variable may be added where necessary to providea link to an object’s container. Still, there are additionalcomplications when the objects do not form a tree. Whena class graph is used to define the class structure for aprogram written in a typical object-oriented language, aconstruction edge from some vertex, A, to another ver-tex, B, implies that every A object has a B object as apart, but not the converse; every B object is not neces-sarily a part of an A object. In general, there may be nosuitable container class for a code transformation, and ifthere is a suitable class it cannot in general be identifiedwithout examining the existing code. The code transfor-

THEORY AND PRACTICE OF OBJECT SYSTEMS–1997 193

Main : main ()

{ exp <- eval() <- print() }

Number : eval ()

{ self }

Compound : eval ()

{ op <- apply(arg1, arg2, value) }

AddOp : apply (a1, a2, v)

{

v <- assign(a1 <- eval())

<- add(a2 <- eval())

}

MulOp : apply (a1, a2, v)

{

v <- assign(a1 <- eval())

<- mul(a2 <- eval())

}

FIG. 9. Program C.

mations presented below, while correct for CG programs,are intended to be used with human guidance in the gen-eral case. The container classes are specified manually,and code must be transformed manually if a containerclass is required by a code transformation rule and noneis available. For strongly typed languages there are thecomplications discussed in [7].

6.1. Transformation Rules

6.1.1. The object-preserving transformations. For theCG language, there are no code updates required for anyobject-preserving transformation. For other untyped lan-

guages, such as CLOS, the only object-preserving trans-formation that requires a code update is deletion of use-less alternation. Note that the “useless” designation isonly relevant from a data modeling point of view, sincethe class may have important methods attached. If theclass is deleted the functionality of the methods attachedto the class must be preserved. Each method can becopied to each of the immediate subclasses that does notoverride it. Now every object will respond to messagesin the same way after the “useless” class is deleted. InCG, there are never any methods to copy since methodsmay only be attached to concrete classes.

6.1.2. Renaming of vertices and edges. No code up-dates are required when vertices are renamed. In thestandard interpretation, renaming an edge correspondsto changing the identifier for an instance variable. Whenan edge, (v l−→ w), is renamed to (v l′−→ w), the identi-fier l′ is substituted for the identifier, l, in the methodsof class v. Since CG provides strong data encapsulation,there is no need to make substitutions in methods of anyother classes.

6.1.3. Nesting of parts. If a class, A, has outgoing at-tribute edges, (A b−→ B) and (A c−→ C), and the edge(A c−→ C) is replaced by an edge (B c−→ C) (the c part ofA is nested under its b part), then in methods attached toclass A, the c part must be accessed indirectly throughits b part. An accessor method to return the c part isadded to class B, and the identifier, c, is replaced byb <- c() in methods of class A. If methods of classC access A objects through the container operator,the access must now be indirect through the intermedi-ate B object. We add an accessor method to class B forits container, and in methods of class C the expressioncontainer is replaced by the expression container

<- container().

6.1.4. Unnesting of parts. If a class, A, has an out-going attribute edge, (A b−→ B) to a class B which hasan outgoing attribute edge, (B c−→ C), to class C, andthe edge (B c−→ C) is replaced by the edge, (A c−→ C),(the c part is unnested from under the b part of classA) then in methods attached to class B, the c part mustbe accessed indirectly through its parent A object. Simi-larly, in methods of class C that access B objects throughthe container operator, the access must now be indi-rect through its parent A object. Accessor methods areadded to class A to return the b and c parts. In methodsof class B, the expression c is replaced by the expres-sion container <- c() and in methods of class Cthe expression container is replaced by container

<- b().

194 THEORY AND PRACTICE OF OBJECT SYSTEMS–1997

Main : main ()

{ bricks <- last() <- weight() <- print() }

Brick : last ()

{ next <- last() }

Bottom : last ()

{ container }

Brick : weight ()

{ container <- weight() <- add(weight) }

Main : weight ()

{ zero }

FIG. 10. Program to calculate weight of brick pile.

6.1.5. Addition of lambda parts. Adding a part doesnot require any change in the code.

6.1.6. Deletion of lambda part. If a class, A, has anattribute edge, (A b−→ B), to a “lambda” class B and theedge is deleted, class A must supply any functionalitythat was formerly delegated to class B. Note that sinceB is a lambda class, none of its methods have accessto any objects other than self and container. Eachmethod of class B is copied unchanged to class A exceptthat the expression container is replaced with self

in the copied methods. In class A’s original methods, theexpression b is replaced with the expression self.

6.1.7. Addition of lambda alternative. When an in-heritance edge, (A =⇒B), is added from a class, A, toa lambda class, B, by addition of lambda alternative, Bobjects may be interspersed in a list of A objects. Anymessage received by such a B object should be passedto the A object that has been displaced. For each methodattached to any subclass of A a corresponding method isgenerated for class B which simply delegates to the nextobject in the list. We also add to each of the original Aclasses a method called this which returns self. To

class B we add a this method that returns container<- this(). In each of the original A methods the ex-pression container is replaced by container <-this() so that these expressions will have the sameobects as their values as if the B objects were not presentin the list.

6.1.8. Deletion of lambda alternative. Deletion of alambda alternative does not require any change in thecode.

6.1.9. Insertion of singleton construction. When anew concrete class, A, with an attribute edge to a class, B,is added by insertion of singleton construction, a methodwhich simply delegates to its B part is added to class Afor each method in class B. An accessor is added to Afor its container, and in the methods of class B the ex-pression container is replaced by container <-container().

6.1.10. Deletion of singleton construction. When aconcrete class, A, with an attribute edge, (A b−→ B), toclass B is deleted by deletion of singleton construction,each method in class A is copied to class B with sub-stitution of the expression self for b. In the originalmethods of class B the expression container is re-placed by self.

6.1.11. Attribute to subclass. When a concrete classis transformed into an abstract class by attribute to sub-class, its methods are simply copied to each of its newsubclasses.

6.1.12. Subclass to attribute. When an abstract class,A, gains an attribute, l by subclass to attribute, each ofits subclasses, B, must be a singleton construction whoseonly outgoing edge, (B l−→ C), has label l. The elim-ination of the subclasses is handled as for deletion ofsingleton construction (above), except that an additionalargument is added to each of the methods copied fromclass B to C, and the copied methods are modified to ac-cess any parts inherited in B (from A) indirectly throughthe extra argument. Accessor methods are added to classA for each of its parts, and for each method copied fromB to C, a corresponding method is added to class A whichsimply delegates to its l part, passing along whatever ar-guments it received plus self for the actual value ofthe extra argument.

6.2. Examples of CG Program Transformations

Example 3. Figures 9–13 show how yet another ver-sion of the prefix expression evaluator (Figure 9) is trans-formed when its class structure evolves first by transform-ing the op attribute of class Compound to subclasses

THEORY AND PRACTICE OF OBJECT SYSTEMS–1997 195

Main : main ()

{ bricks <- last() <- weight() <- print() }

Main : weight ()

{ zero }

Brick : last ()

{ next <- last() }

Brick : weight ()

{

container <- this()

<- weight() <- add(weight)

}

Brick : this ()

{ self }

Bottom : last ()

{ container <- this() }

Balloon : this ()

{ container <- this() }

Balloon : last ()

{ next <- last() }

Balloon : weight ()

{ next <- weight() }

FIG. 11. After adding balloons to the pile.

followed by deletion of the useless alternation vertex Op(Figure 12) and then by deleting the singleton construc-tion vertices MulExp and AddExp followed by renam-ing of the classes MulOp and AddOp to MulExp andAddExp, respectively (Figure 13).

Example 4. Figure 10 shows a CG program to calcu-late the total weight of all the bricks in a pile. If theclass graph evolves by addition of lambda alternative toallow balloons to be interspersed with the bricks the code

Main : main ()

{ exp <- eval() <- print() }

Number : eval ()

{ self }

MulExp : eval ()

{ op <- apply(arg1, arg2, value) }

AddExp : eval ()

{ op <- apply(arg1, arg2, value) }

AddOp : apply (a1, a2, v)

{

v <- assign(a1 <- eval())

<- add(a2 <- eval())

}

MulOp : apply (a1, a2, v)

{

v <- assign(a1 <- eval())

<- mul(a2 <- eval())

}

FIG. 12. Program D.

to calculate the total weight of the bricks is updated asshown in Figure 11.

7. Practicality of the Approach

7.1. Limitations of the Data Model

The data model used here is a formal mathematicalmodel (see Appendix A), that was chosen to provide pro-gramming language independence and a sound theoreti-cal basis for the methodology. It is a “low-level” modelthat starts with only those relationships that are directlysupported by most most object-oriented programming

196 THEORY AND PRACTICE OF OBJECT SYSTEMS–1997

Main : main ()

{ exp <- eval() <- print() }

Number : eval ()

{ self }

MulExp : eval ()

{ self <- apply(arg1, arg2, value) }

AddExp : eval ()

{ self <- apply(arg1, arg2, value) }

AddExp : apply (a1, a2, v)

{

v <- assign(a1 <- eval())

<- add(a2 <- eval())

}

MulExp : apply (a1, a2, v)

{

v <- assign(a1 <- eval())

<- mul(a2 <- eval())

}

FIG. 13. Program E.

languages: part-of (data members) and kind-of (subclass-ing). Concrete syntax and ordering of parts are added torelate the semantics of different class structures.

It is expected that the approach will often be applied inthe same way it was developed. That is, a class structurewithout concrete syntax or part ordering will be embed-ded in a grammar for the purpose of maintenance. Thegrammatical requirements of the model should not re-strict its usefulness. In many cases, it may even be pos-sible to recognize class transformations as “language-preserving” outside the context of a grammar. Still, theconcrete syntax and ordering of parts is a valuable meansof documenting the nature of class transformations.

The main constraints in our model are the lack of in-heritance from concrete classes and the inability to over-ride (or shadow) data members in subclasses. We alsodisallow name clashes due to multiple inheritance. Noneof these constraints pose a major obstacle to the use ofthe approach.

A class organization with inheritance from concreteclasses may be easily (and automatically) restructured toeliminate this kind of inheritance. The concrete class isreplaced by an abstract class in the inheritance hierarchy,the concrete class is made a subclass of the new abstractclass, and all methods of the concrete class are moved tothe new abstract class. There are no code transformationsrequired.

In the author’s opinion, shadowing data members insubclasses is a dangerous and generally undesirable tech-nique. In any case, it can be simulated by implementingthe data member as a method, and overriding in sub-classes.

Any language, or model, that allows multiple inheri-tance must somehow cope with potential name clashes.For example, in Eiffel, there is a mechanism to renameinherited data members. In C++, names must be quali-fied by using the class scope resolution operator. Thesedesign decisions have only a minor effect on the mechan-ics of code transformations. By requiring unique names,we have, in essence, adopted the C++ approach. If weconsider instance variables to have the class name wherethey are defined as an implicit prefix, we get uniquenames.

Our low-level model is probably unsuitable for high-level analysis and design. However, models in commonuse, including the new Unified Modeling Language, maybe mapped to the CG model in the same way that theyare mapped to programming languages. A CASE toolsupporting such models could incorporate support formaintenance of user written code based on the mappingto low-level constructs.

7.2. Limitations of the Language Model

The limitations of the CG language model include:

• No conditional expressions• No looping constructs• No inheritance of methods• No programmer control over encapsulation• No true dynamic object construction• Object structure at run time must be a tree• The language is untyped• No non-object primitives (e.g., integer, character, etc.)

Most of these language features were left out of themodel merely to simplify the discussion, and to allow usto focus on those features at the core of object-orientedprogramming: data encapsulation, message passing, latebinding, and polymorphism. We are currently imple-menting a system that will support all of the features

THEORY AND PRACTICE OF OBJECT SYSTEMS–1997 197

FIG. 14. Reorganization of the Demeter system class structure.

mentioned above except static typing. Our implementa-tion will support C++, but circumvents the type systemby supplying an Object class from which every otherclass inherits. We define a default, message not under-stood, method for class Object corresponding to eachmethod in every other class. All member functions andparameters are declared to have type Object. We pro-vide “wrapper” classes for the non-object types. In thefuture, we will eliminate the Object class, and mod-ify our transformations to satisfy the C++ type sys-tem. Many of the issues relevant to type system supportin program transformations have already been investi-gated [7].

The addition of conditionals and loops has had no im-pact on the transformation rules. Inheritance of methodshas very little effect. In those cases where it is important,a method attached to an abstract class can be copied to

each of the subclasses that does not override it, and theneliminated. If a method in a subclass calls this methodusing the class scope resolution operator, the code canbe inlined. Later, identical methods may be abstracted tocommon superclasses in the same way that data membersare abstracted.

The program transformations assume that all meth-ods are public, and all data members are protected. Ifstronger encapsulation is present, and a program trans-formation requires access which is not allowed, we sim-ply weaken the encapsulation (with appropriate warningsand user interaction). In C++ the only encapsulationweaker than what we have assumed, is the use of publicdata members. We consider this very poor programmingpractice and our system will not support it.

The problem with object structures that are not trees,in general, and of dynamic object creation, in particular,

TABLE 1. Standard interpretation of class graphs.

198 THEORY AND PRACTICE OF OBJECT SYSTEMS–1997

FIG. 15. Forbidden subgraphs.

is that objects may not have a suitable container objectfor transformations that require it. As noted in Section 6,user interaction is necessary in such cases.

7.3. Reorganizations That Are Not LanguagePreserving

The most commonly occurring reorganizationsthat are not language preserving are those that addnew classes, or new attributes to existing classes.These object-extending class transformations are mucheasier to manage than the language-preserving trans-formations described here. A complete set of primi-tive object-extending transformations, and correspondingcode update rules is described in [7]. The maintenancetechniques for language-preserving transformations areintended to augment, rather than replace, existing tech-niques to allow for automatic maintenance over a muchlarger set of class reorganizations.

7.4. Practical Experience

The original motivation for this work was a majorrevision of the Demeter system’s class graph, which re-quired the entire system to be manually ported to the newenvironment by rewriting all the code. This was true eventhough the languages defined by the class graphs werenearly identical, and the functionality of the programscomprising the system was unchanged.

The methodology presented in this paper has been ap-plied, by hand, to parts of the system that motivated theresearch. The Demeter System [17, 33] originally useda notation based on grammars and later changed to agraph based notation. When the notation was changedthe class structure was reorganized to properly modelthe new perspective. For example, a small portion of thesystem’s original class structure and the correspondingportion of the new structure are shown in Figure 14.

The strategy suggested in Section 8.2 was used to finda sequence of primitives to accomplish the overall classstructure transformation. This strategy proved effectivefor the test case with approximately 40 classes in each

structure. A sequence of primitives was found in approx-imately two hours.

Next, portions of the CLOS code used to implementthe original system were modified according to the rulesin Section 6 (adding instance variables to link objects totheir “containers” where necessary), and the correctnessof the new code was verified.

The same methodology has been used successfully toguide the evolution of more recent versions of the Deme-ter System which have been implemented in C++. In thiscase, the code transformation rules had to be augmentedsomewhat in order to satisfy the type system. For a morecomplete discussion of type system issues see [7].

8. Search Algorithms

If the primitive language-preserving transformationsare used to restructure the class organization of a CGprogram, the code may be automatically updated follow-ing the rules defined in Section 6. More generally, givenan arbitrary CG program and a new language-equivalentclass graph, we must be able to find a sequence of prim-itives that produces the given transformation in order toapply the code transformation rules.

8.1. Regular Languages

Since the primitive transformations are not completefor regular class graphs without the addition of a meta-transformation, it is not always possible to reduce an ar-bitrary language-preserving transformation over the reg-ular class graphs to a sequence of primitives. However,there is an algorithm to perform the reduction to a se-quence of primitives and meta-transformations. Manualcode updates must then be performed only for the meta-transformations.

The proof that Salomaa’s axiom system for the reg-ular expressions is complete [32] is constructive in thesense that for any valid equation X = Y, over the regularexpressions it gives a method to construct its proof. Toreduce an arbitrary language-preserving transformationover the regular class graphs to a sequence of primitiveswe first construct the proof that the corresponding reg-ular expressions are equivalent. Each substitution in theproof is mapped to a sequence of primitives as definedin Section 5.2.3. Each solution of equations is mappedto the meta-transformation defined in Section 5.2.3.

8.2. Context Free Languages

There can be no algorithm guaranteed to reducean arbitrary language-preserving transformation over theclass graphs to a sequence of primitives. Even if theset of primitive transformations were complete, suchan algorithm would be impossible since equivalenceof context-free languages is undecidable, and the class

THEORY AND PRACTICE OF OBJECT SYSTEMS–1997 199

TABLE 2. Standard interpretation for object graphs.

graphs define context-free languages4. Nevertheless, it isreasonable to expect that searches for sequences of prim-itives will often terminate successfully since the primi-tives are designed to represent the kinds of transforma-tions that are likely to arise in practice.

The search problem may be viewed in terms of theclassic state-space search paradigm as defined in the lit-erature of artificial intelligence. Given an initial state, S,a set of operators on states, O, and a set of goal states,G, the state-space is defined as a directed graph whereeach node represents a state and each arc represents anoperation. The problem is to find a path from the initialstate to a goal state. Normally, the graph is not madeexplicit except for the solution path.

In our case, the initial state is a class graph, the oper-ators are the primitive language-preserving transforma-tions, and the only goal state is a language-equivalentclass graph. Alternatively, we may consider the set ofgoal states to be the set of all class graphs which areobject-equivalent to a given language-equivalent classgraph since we already have efficient algorithms forchecking object-equivalence and reducing an object-preserving transformation to a sequence of primitives.

State-space search has been heavily investigated in AI,and sophisticated systems have been developed for eval-uating states and choosing the next operation to applyin various domains. A detailed algorithm of this sort isbeyond the scope of the current work and is left for fu-ture research. However, a simple search strategy mightproceed as follows: The state space is searched in depth-first order (with backtracking) and operators are appliedto vertices in the class graph in breadth-first order start-ing with the Main class. The Main vertex of the initialclass graph is brought into congruence with the Mainvertex of the goal state by applying operators (primitivetransformations) until the vertices have the same typesand numbers of outgoing edges, outgoing constructionedges have the same labels, and outgoing syntax edges

FIG. 16. Legality rule.

have the same targets. Next, the target of each construc-tion and alternation edge is brought into congruence withthe corresponding class in the goal state in the same man-ner, and the process continues until the target is reachedor the depth in the state space exceeds some specifiedvalue.

An important heuristic which can be used to improvethe search performance is to use the names of classesand labels of edges to guide the search. If, for example,a vertex must have an outgoing construction edge withlabel l to a vertex labeled V, we first check to see if thereis already an outgoing edge with label l. If two analogousclasses have parts with the same names, we guess that theparts are also analogous. Otherwise, we check if someother vertex has an outgoing edge with label l and targetV that can be brought into the proper position by nestingand unnesting of parts. If neither condition is met welook for a vertex labeled V and finally for an edge withlabel l.

This strategy is useful if the class graph changes grad-ually during the evolutionary process, since most classesand parts will retain their original names. It is also use-ful if the designers use names consistently when reor-ganizing the class structure. Finally, if a class graph haschanged dramatically it may be easy for a human de-signer familiar with the application domain to supply amapping between classes with analogous roles by manu-ally renaming parts and classes before starting the search.For the human designer, giving a partial analogy by re-naming the parts and classes is the “easy” part — elab-orating the analogy by finding the primitive transforma-tions and then updating all of the code is the “hard” part.For the machine, the reverse is true; thus, the machinecomplements the abilities of the human designer whenthis strategy is employed.

FIG. 17. Class graph, φ, and legal object graph, φ.

200 THEORY AND PRACTICE OF OBJECT SYSTEMS–1997

FIG. 18. Fruit class graph.

The concrete syntax may be used as a further guideof the search or to prune nodes in the state space if wenote that it is not possible to find a solution by bringingtwo vertices into congruence that have different sets ofreachable syntax vertices.

9. Related Work

9.1. Software Refactoring

In their software refactory project, Opdyke and John-son [28, 27, 29, 19] have worked on building a toolto support various aspects of object-oriented programtransformation, including the maintenance of behavioralconsistency during schema evolution. Many of the codetransformation issues they address are similar to the is-sues addressed here, and their solutions are also similarin some cases. In particular, they consider reorganizationof “aggregate/component” hierarchies, and conversionbetween aggregation and inheritance. The problems andsolutions they present are quite similar to those presentedhere for nesting/unnesting of parts and subclass-to-attribute/attribute-to-subclass conversions, respectively.

FIG. 19. Fruit object graphs.

The work of Opdyke and Johnson differs in severalways from the work presented in this paper. Perhapsthe most important is the theoretical basis of the cur-rent work in formal languages. In our case, the programtransformation space is clearly and concisely defined.Furthermore, language-equivalent class graphs guaran-tee that programs written in the CG language will acceptthe same inputs, and that their run-time objects have thesame textual descriptions. This is the answer to the im-portant question: “Why is it reasonable to expect the ex-istence of code that will make a system of transformedobjects behave in a manner functionally equivalent to theoriginal system?”

The alternative answer is that the transformation wasaccomplished via a sequence of primitives for which cor-rect code transformations are known. Opdyke and John-son rely solely on this second justification, and requirethat users of their system directly apply primitives (orcompositions of primitives already known to the system).In the system we envision, users need not even be awareof the existence of the primitive transformations. In thecontext of reuse (as opposed to evolution), the users ofOpdyke and Johnson’s system would have to perform asearch for a sequence of primitives to transform the classstructure of the existing code to the class structure whereit is to be reused. In the system we envision, there is asearch engine to perform this task automatically.

9.2. Structure Mapping Theory

In structure mapping theory [14, 15] knowledge isrepresented as propositional networks comprising objectnodes and predicates (attributes and relations). An anal-ogy maps object nodes from the base domain to object

FIG. 20. (a+ (b+ c)) = ((a+ b) + c)

THEORY AND PRACTICE OF OBJECT SYSTEMS–1997 201

FIG. 21. (a · (b · c)) = ((a · b) · c)

nodes in the target domain. Generally, there is a 1-1 map-ping between nodes in the base and target domains. Eachpair of corresponding object nodes in the mapping is partof the analogy: “the target is like the base.” The analogyis applied by using mapping rules, based on the principleof systematicity, to determine which predicates should bebrought from the base domain to the target domain. Theselected predicates are carried over using the node sub-stitutions indicated in the object mapping.

A sequence of primitive transformations where eachprimitive only renames a vertex in a class graph wouldbe equivalent to an analogy as defined by structuremapping theory. However, the primitive transformationscan be more expressive since they may include changesin certain structural relations (e.g., part-of, kind-of) aspart of the analogy. For example, in the base domain aclass Human might have an attribute (part) called Gen-der, with possible values (kinds) Male or Female. Inthe target domain an analogous structure might have aclass Person with subclasses (kinds) Man and Woman.The simple mapping (Human → Person,Male →Man,Female → Woman) does not properly express theanalogy. Instead, the relationship between Person andHuman must be qualified, as in: “a Person is like aHuman where the attribute Gender is expressed bysubclassing.” Gentner’s structure mapping theory is notpowerful enough to express such a qualified structuralanalogy, but this can be expressed by a primitive trans-formation, say “attribute-to-subclass.”

In our work, application of the analogy involves bring-ing relations, in the form of program code, from the basedomain over to the target domain. As in structure map-ping theory, the rules depend only on syntactic proper-ties and not on an understanding of the contents of the

domains. Therefore, the code is brought over with littlemodification.

Structure mapping theory says nothing about how ananalogy, “the target is like the base,” is broken down intoa mapping of nodes in the base to nodes in the target.In our case, a search is performed to find a sequenceof primitive transformations that would convert the basestructure to the target structure.

9.3. Analogical Program Synthesis Guidedby Correctness Proofs

Ulrich and Moll [38] have used correctness proofs toguide the formation of analogies and the construction ofanalogous programs. Each line in the proof of a programwritten for the base domain is mapped into a statementin the target domain. Terms and relationships in the tar-get domain are substituted for terms and relationshipsin the original proof. As the process is carried out, theoriginal program is modified by the same substitutions.This process produces a new program and its correctnessproof at the same time.

Dershowitz and Manna [13] used a similar approachto automatically modify programs. They formulate ananalogy as a set of substitutions that yield a specificationof the desired program when applied to the specificationof an analogous program. The specifications, includinginput specifications, are given for both programs in ahigh-level assertion language. In our case, the known CGprogram contains its own input specification in the formof a class graph.

An important aspect of a program specification is theinclusion of invariant assertions which are utilized incorrectness proofs. Transformations are applied to all as-sertions as well as to program code. The transformed as-

FIG. 22. (a+ b) = (b+ a)

202 THEORY AND PRACTICE OF OBJECT SYSTEMS–1997

FIG. 23. (a+ a) = a

sertions can be used to obtain verification conditions forthe new program. In our case, it is the language definedby the class graphs that remains invariant. Correctness isguaranteed by the correctness of the primitive transfor-mations.

10. Conclusions

By extending a typical graph based data model toinclude concrete syntax, we have produced a new modelthat can be used to simultaneously define both a classstructure and a language for describing instances of theclasses textually. When the extended data model is incor-porated into a programming environment, we get pro-grams that define a language for describing their run-

time objects and are self documenting as to their legalinputs. The result is a novel framework for dealing withobject restructuring with a theoretical basis in formallanguages.

The methods described for maintaining behavioralconsistency of evolving systems have been successfullyapplied by hand to the development of the Demeter Sys-tem. An automated prototype is currently in the planningstage.

Acknowledgments

The author would like to thank the anonymous ref-erees of TOPLAS and TAPOS for their many helpfulcomments on earlier versions of this article.

This research was partially supported by Ohio Boardof Regents Grant No. 663019.

Appendix A

Data Model

A.1. Class Graphs

The class graphs described in this section use three

FIG. 24. (a · (b+ c)) = ((a · b) + (a · c))

THEORY AND PRACTICE OF OBJECT SYSTEMS–1997 203

FIG. 25. (a · λ) = a

kinds of vertices to model abstract classes, instantiableclasses, and concrete syntax. We also use three kinds ofedges to model knows-of, kind-of, and has-syntax rela-tionships. The knows-of relationship is a generalizationof the aggregation relation which only describes physicalcontainment.

The knows-of and has-syntax relations comprise whatwe call the “part-of” relation. For lack of a better term,we call the syntax and known collaborators of an objectits “parts,” although the parts need not be physical parts.For example, in our terminology a car is part of a wheelif the wheel knows about the car.

Definition. A class graph5, φ, is a directed graph, φ =(V,VS,Λ; EC,EA,ES,Ord), with finitely many vertices V.VS is a set of strings called the syntax vertices. Λ isa finite set of labels. There are four defining relations:EC,EA,ES,Ord. EC is a ternary relation on V× V× Λ,

called the (labeled) construction edges: (v l−→ w) ∈ ECif and only if there is a construction edge with label lfrom v to w. EA is a binary relation on V×V, called thealternation edges: (v =⇒ w) ∈ EA if and only if there is analternation edge from v to w. ES is a binary relation onV×VS called the syntax edges: (v → w) ∈ ES if and onlyif there is a syntax edge from v to w. Ord : (EC∪ES)→N is a function that maps each construction and syntaxedge to a natural number.

Next the set of vertices is partitioned into two subclasses,called the construction and alternation vertices.

Definition.

• The construction vertices are defined by:VC = {v | v ∈ V,∀w ∈ V : (v =⇒ w) /∈ EA}.In other words, the construction vertices have no out-going alternation edges.

FIG. 26. ((a+ b) · c) = ((a · c) + (b · c))

204 THEORY AND PRACTICE OF OBJECT SYSTEMS–1997

FIG. 27. (a∗) = (λ + (a · (a∗)))

• The alternation vertices are defined by:VA = {v | v ∈ V,∃w ∈ V : (v =⇒ w) ∈ EA}.In other words, the alternation vertices have at leastone outgoing alternation edge.

Sometimes, when we want to talk about the construc-tion and alternation vertices of a class graph, it is moreconvenient to describe a class graph as a tuple whichcontains explicit references to VC and VA:

φ = (VC,VA,VS,Λ; EC,EA,ES,Ord).In standard object-oriented terminology we describe

here the accepted programming rule: “Inherit only fromabstract classes” [18]. This rule can be exploited to derivean analogy between class graphs and grammars.

We use the following graphical notation, based on[36], for drawing class graphs: squares for construction

THEORY AND PRACTICE OF OBJECT SYSTEMS–1997 205

vertices, hexagons for alternation vertices, quoted stringsfor syntax vertices, thin arrows for construction and syn-tax edges, and wide arrows for alternation edges.

Example 1. For further illustration we give the compo-nents of the formal definition, for the class graph, φ1, ofFigure 1:

VC = {Number, AddExp,MulExp}VA = {Prefix, Compound}VS = {“)”, “(”, “+”, “∗”, [0− 9]+}Λ = {num, arg1, arg2}

EC = {(Compound arg1−→ Prefix),

(Compoundarg2−→ Prefix)}

EA = {(Prefix =⇒ Number),

(Prefix =⇒ Compound),

(Compound =⇒ AddExp),

(Compound =⇒MulExp)}ES = {(Compound→ “(”), (Compound→ “)”),

(Number → [0− 9]+), (AddExp → “+”),

(MulExp → “∗”)}Ord = {(Compound arg1−→ Prefix, 3),

(Compoundarg2−→ Prefix, 4),

(Compound→“(”, 1),

(Compound→ “)”, 5), (Number → [0− 9]+, 1),

(AddExp → “+”, 2), (MulExp → “∗”, 2)}The definition of VC implies that EA ⊆ VA×V, since

an alternation edge cannot start at a construction vertex.We use Vφ,VCφ,VAφ, etc., to refer to the components ofclass graph φ.

When we draw a class graph, the vertices are labeledso that we can conveniently refer to particular vertices inour discussion. The standard interpretation implies thatthe labels on construction vertices are significant. Con-sider two isomorphic class graphs each with only a sin-gle construction vertex and no edges. If the constructionvertex of one graph is labeled Integer and the vertexof the other graph is labeled String, then the two classgraphs define different sets of objects in the standard in-terpretation. On the other hand, changing the labels ofthe alternation vertices (names of abstract classes in thestandard interpretation) does not effect the defined ob-jects. Therefore, we adopt the following convention forlabeling the vertices of class graphs: Labels of alterna-tion vertices are local to the class graph in which theyoccur; labels of construction vertices are global. That is,if two class graphs have construction vertices with thesame label, it means that the same vertex (same class un-der the standard interpretation) belongs to both graphs.However, we may in general assume that different classgraphs have disjoint sets of alternation vertices regard-less of their labels.

The same semantics apply when we denote the sets ofvertices in a class graph textually. The identifiers we useto denote alternation vertices are of local scope whereasthe identifiers we use to denote construction verticeshave global scope.

Later we give conditions which make a class graphinto a legal class graph. The interpretation in Table 1is only one possible interpretation which we call thestandard interpretation. The motivation behind the ab-stract alternation/construction terminology is that thereare several useful interpretations of class graphs. In oneof those interpretations, a construction vertex is inter-preted as an operation. We often use the standard inter-pretation to give intuitive explanations of relationshipsand algorithms.

Please note that the syntax for an alternation ver-tex/abstract class, although very natural from a graph-theoretic point of view, appears unnatural from the pointof view of today’s programming languages: In most pro-gramming languages which support the object-orientedparadigm, the inheritance relationships are described inthe opposite way. Each class indicates from where it in-herits. Of course, we can easily generate this informationfrom class graphs, but we feel that the Demeter notationis easier to use for design purposes. One reason is thatthe design notation shows the immediate subclasses of aclass and therefore promotes proper abstraction of com-mon parts. Another reason is that a class does not containinformation about where it inherits from and thereforethe class can be easily reused in other contexts.

Definition. In a class graph, φ = (V,VS,Λ; EC,EA,ES,Ord), a vertex w ∈ V is alternation-reachable from

vertex v ∈ V (we write v?=⇒ w):

• via a path of length 0, if v = w• via a path of length n + 1, if ∃u ∈ V such that

(v =⇒ u) ∈ EA and u?=⇒ w via a path of length n.

In other words, the alternation-reachable relation isthe reflexive, transitive closure of the EA relation.

In the standard interpretation, (v?=⇒ w) means that

either w inherits from v or w = v.Sometimes when we want to discuss the inheritance

hierarchy, it is convenient to refer to the alternation sub-graph of a class graph. The alternation subgraph containsall of the alternation vertices and alternation edges plusthe construction vertices that have incoming alternationedges.

Definition. The alternation subgraph of a class graph,φ = (VC,VA,Λ; EC,EA), is a directed acyclic graph(DAG), G = (V′,EA), where V′ = VA∪{v ∈ VC |∃u :(u =⇒ v) ∈ EA}.

It is often helpful to think of each alternation vertexas representing a set of associated construction vertices.This set, A(v), consists of all the construction verticeswhich are alternation reachable from the vertex, v. If vis an alternation vertex with an incoming construction

206 THEORY AND PRACTICE OF OBJECT SYSTEMS–1997

FIG. 28. (a∗) = ((λ + a)∗)

edge, (u l−→ v), the construction vertices in A(v) rep-resent the concrete classes which might be used to in-stantiate the l part of u objects. If v has an outgoingconstruction edge, (v l−→ w), the construction vertices inA(v) represent the concrete classes which inherit the lpart from v.

Definition. The associated classes of a vertex, v, in aclass graph,φ = (VC,VA,VS,Λ; EC,EA,ES,Ord), is the set of all con-

struction vertices which are alternation-reachable fromv:

A(v) = {v′|v ?=⇒ v′ and v′ ∈ VC}

A.1.1. Legality conditions. A legal class graph is astructure which satisfies three independent conditions.

Definition. A class graph φ = (V,VS,Λ; EC,EA,ES,Ord) is legal if it satisfies the following three conditions:

THEORY AND PRACTICE OF OBJECT SYSTEMS–1997 207

1. Cycle-free alternation condition:There are no cyclic alternation paths, i.e.,

{(v, w) | v, w ∈ V, v ≠ w, and v?=⇒ w

?=⇒ v} =∅.

2. Unique labels condition:

∀u, v, v′, w, w′ ∈ V, l ∈ Λ such that (v?=⇒ u),

(v′?=⇒ u), and (v, w) ≠ (v′, w′) :

{(v l−→ w), (v′ l−→ w′)} /⊆ EC3. Unique numbering condition:

∀u, v, v′ ∈ V and e, e′ ∈ (EC∪ES) where v?=⇒ u,

v′?=⇒ u, e ≠ e′ : If ∃w,w′, l, l′ such that e = (v

l−→w) or e = (v → w), and e′ = (v′ l′−→ w′) or e′ =(v′ → w′), then Ord(e) ≠ Ord(e′)

When we refer to a class graph in the following wemean a legal class graph, unless we specifically mentionillegality.

The cycle-free alternation condition is natural and hasbeen proposed by other researchers, e.g., [31, page 396],[35, page 109: Class names may not depend on them-selves in a circular fashion involving only (alternation)class productions]. The condition says that a class maynot inherit from itself.

The unique labels condition guarantees that “inher-ited” construction edges are uniquely labeled and ex-cludes class graphs which contain the patterns shownin Figure 15.

Other mechanisms for uniquely naming the construc-tion edges could be used, e.g., the renaming mechanismof Eiffel and the overriding of part classes [26]. The the-ory does not seem to be affected significantly by smallchanges such as this.

The unique numbering condition is similar to theunique labels condition. It guarantees that the construc-tion and syntax edges inherited at any vertex are totallyordered.

A.2. Object Graphs

We have defined the concept of a class graph whichmathematically captures some of the structural knowl-edge which object-oriented programmers use. Next wedefine object graphs and their relation to class graphs.An object graph defines a hierarchical object and is mo-tivated by the interpretation of an object graph, calledthe standard interpretation, given in Table 2.

Definition. An object graph, ψ, is a directed graphψ = (W,Ws, S,Λψ;E,Es, λ,Ord) where:

• W is a finite set of vertices.• Ws is a set of strings called the syntax vertices.• S is an arbitrary finite set.• Λψ is a set of labels.• E is a ternary relation on W × W × Λψ. If (v

l−→w) ∈ E we call l the label of the edge (v

l−→ w). Notwo edges outgoing from the same vertex may havethe same label. That is, ∀v, w, w′ ∈ W, l ∈ Λψ suchthat w ≠ w′ : {(v l−→ w), (v

l−→ w′)} /⊆ E.

• Es is a binary relation on W ×Ws called the syntaxedges.

• λ : W → S is a function that maps each vertex of ψto an element of S.

• Ord : (E∪Es)→N is a function that maps each edgeto a natural number.

Normally, the set S is a subset of the construction ver-tices of some class graph. In the standard interpretation,the function λ maps each object in an object graph to theclass of which it is an instance. We use a graphical no-tation for object graphs similar to that for class graphs.Vertices are represented by circles and edges by labeledarrows. The vertices are labeled with their class names(their mapping under λ). In case we wish to distinguishmore than one instance of a class, the labels may be pre-fixed with an instance name followed by a “:”.

Not every object graph with respect to a class graphis legal; intuitively, the object structure has to be con-sistent with the class definitions. Each object can onlyhave parts as prescribed in the class definition and theparts prescribed in the class definitions must appear inthe objects (see Figure 16).

Definition. Let p1, p2, . . . , pn be the outgoing edges (in-cluding syntax edges) from a vertex, v ∈W, of an objectgraph such that Ord(pi) < Ord(pi+1), 1 ≤ i < n. Thenthe PartOrder(v,pi) = i.

Let q1, q2, . . . , qn be the construction and syntax edgesoutgoing from all vertices, v′, from which a vertex,v ∈ VC of a class graph is alternation reachable,such that Ord(qi) < Ord(qi+1), 1 ≤ i < n. Then thePartOrder(v,qi) = i.

Definition. An object graph, ψ = (W,Ws, S,Λψ;E,Es, λ,Ord), is legal with respect to a class graph,φ = (VC,VA,VS,Λ; EC,EA,ES,Ord), if and only if foreach vertex, v ∈W:

• λ(v) ∈ VCEach vertex in the object graph maps to a constructionvertex in the class graph.

• ∀(rl−→ s) ∈ EC where r

?=⇒ λ(v) : ∃w ∈ W such

that (vl−→ w) ∈ E

Each object has all of the sub-objects prescribed bythe class graph.

• ∀(r → s) ∈ ES where r?=⇒ λ(v) : (v → s) ∈ Es

Each object has all of the concrete syntax prescribedby the class graph.

• ∀w, l where (vl−→ w) ∈ E : ∃(r

l−→ s) ∈ EC such

that r?=⇒ λ(v), s

?=⇒ λ(w), and PartOrder(v, (v

l−→w)) = PartOrder(λ(v), (r

l−→ s))Each object has only the sub-objects prescribed by theclass graph and has them in the proper order.

• ∀s where (v → s) ∈ Es : ∃r such that (r →s) ∈ ES, r

?=⇒ λ(v), and PartOrder(v, (v → w)) =

PartOrder(λ(v), (r → s))Each object has only the concrete syntax prescribedby the class graph and has it in the proper order.

208 THEORY AND PRACTICE OF OBJECT SYSTEMS–1997

Example 2. Consider the graphs in Figure 17. The ob-ject graph, ψ, is legal with respect to the class graph,φ. The object graph is given by: W = {i1, i2, i3, i4}, E ={(i1 b−→ i2), (i1 c−→ i3), (i2 bc−→ i4)},Λψ = {b, bc, c}, λ ={i1→ A, i2→ B, i3→ C, i4 → C}.Example 3. Consider object graphs in Figure 19 whichare illegal with respect to the class graph in Figure 18.The first object graph is illegal since apples don’t con-tain stones and the second because Cherry is notalternation-reachable from Number.

The language of a class graph, φ, is formally definedin terms of sentences which are defined, in turn, by theobject graphs which are legal with respect to φ.

Definition. An acyclic object graph, ψ, rooted at aunique vertex, v, has a textual representation, calledsentence(ψ), which is the string of syntax vertices en-countered during a depth first traversal of ψ starting fromv. If an object graph, ψ, is cyclic or unrooted, sentence(ψ)is undefined.

We say that an object graph, ψ = (W,Ws, S,Λψ;E,Es, λ, Ord), is rooted at vertex v if v is the only vertexin W with an in-degree of 0.

Definition. The language defined by a class graph,φ = (V,VS,Λ; EC,EA,ES, Ord), is given by:

L(φ) = {s|∃ψ : s = sentence(ψ),

and ψ is legal with respect to φ}

The input language of a CG program, P, with classgraph, φ, is given by:

L(P) = {s|∃ψ : s = sentence(ψ),

ψ is legal with respect to φ,

and ψ is rooted at Main}Definition. The set of all legal object graphs with re-spect to a class graph, φ, is called Objects(φ).

The definitions above relate a class graph with a setof object graphs. In object-oriented programming lan-guage terminology, a class graph corresponds to a set ofclass definitions and the object graphs correspond to theobjects which can be created calling “constructor” func-tions of the classes. In some languages, e.g., C++, theclass definitions considerably restrict the objects whichcan be created. The definitions above demand even morediscipline than C++.

In the context of evolution, we often wish to discussobject graphs that are not legal with respect to the currentclass graph. We sometimes refer to these object graphs asobject example graphs since our goal is often to modifya class graph so that it will become compatible with anew set of objects based on examples.

A.3. Related Work

The axiomatic model which is used in this paper isnew but similar data models exist in the literature. Inparticular, the notions of “alternation” and “construc-tion” appear as “classification” and “aggregation” in bothHull and Yap’s Format Model [16] and Kuper and Vardi’sLDM [20]. Ait-Kaci’s feature structures [2] are also re-lated to the Demeter kernel model. Our abstraction al-gorithms [25, 6] can be adapted to abstract feature struc-tures from examples.

Other related work in the data base field is describedin: [1, 9, 37].

Appendix B

Formal Definition of PrimitivesB.1. Object-Preserving Transformations

The definitions of the object-preserving transforma-tions for class graphs are as follows:

• Renumbering of parts. Any set of construction andsyntax edges in a class graph, φ, may be renum-bered (by replacing the Ord function) to producea new class graph, φ′, if for all vertices, v ∈ V,and edges, e ∈ (EC∪ES), such that v is alternationreachable from the source of e: PartOrderφ(v, e) =PartOrderφ′ (v, e).

• Abstraction of common parts. If ∃v, w, l, i suchthat ∀v′, where (v =⇒ v′) ∈ EA : (v′ l−→ w) =e ∈ EC andOrd(e) = i, or (v′ → w) = e ∈ES andOrd(e) = i, then all of the edges, e, can bereplaced by a new edge, e′, with v as its source andOrd(e′) = Ord(e).Intuitively, if all of the immediate subclasses of classC have the same part, that part can be moved up theinheritance hierarchy so that each of the subclasseswill inherit the part from C, rather than duplicatingthe part in each subclass.

• Distribution of common parts. An outgoing con-struction edge, e = (v

l−→ w), or syntax edge,e = (v → w), can be deleted from an alternation ver-tex, v, if for each (v =⇒ v′) ∈ EA a new constructionedge e′ = (v′ l−→ w), or syntax edge e′ = (v′ → w),respectively is added with Ord(e′) = Ord(e).This is the inverse of abstraction of common parts.

• Deletion of “useless” alternation. An alternation ver-tex is “useless” if it has no incoming edges and nooutgoing construction edges. If an alternation vertexis useless it may be deleted along with its outgoingalternation edges.Intuitively, an alternation vertex is useless if it is nota part of any construction class, and it has no partsfor any construction class to inherit.

• Addition of “useless” alternation. An alternationvertex, v, can be added along with outgoing alterna-tion edges to any set of vertices already in the classgraph. This is the inverse of deletion of useless alter-nation.

• Part replacement. If the set of construction verticeswhich are alternation-reachable from some vertex,

THEORY AND PRACTICE OF OBJECT SYSTEMS–1997 209

v ∈ V, is equal to the set of construction verticesalternation-reachable from another vertex, v′ ∈ V,then any construction edge (w l−→ v) ∈ EC can bedeleted and replaced with a new construction edge,(w l−→ v′).Intuitively, if two class C1 and C2 have the same setof instantiable (construction) subclasses, then the de-fined objects do not change when C1 is replaced byC2 in a part definition. Note that the inverse of partreplacement is just another instance of the transfor-mation.

B.2. Renaming of Vertices and Edges

Any construction edge, (v l−→ w) ∈ EC, may bereplaced by a construction edge with a different label,(v l′−→ w). Also, any construction vertex, v ∈ VC, maybe replaced by a different construction vertex, v′, withthe same incoming and outgoing edges. When viewinga picture of a class graph it appears that the vertex hasbeen “renamed” by changing its label. Since the labelsor identifiers used to denote construction vertices havea global scope, and the same identifiers may be usedto denote vertices in other class graphs, a changed la-bel implies a changed vertex. On the other hand, sincethe identifiers used to denote alternation vertices havea scope local to the class graph, changing the labels ofalternation vertices may be done freely, but does not inany way “transform” the class graph.

B.3. Nesting of Parts

Given a vertex, w ∈ V with no incoming alternationedges and a different vertex, u ∈ (V∪VS), such that for

every construction edge, ev = (v → w′), where w?=⇒ w′,

there is a syntax or construction edge, e′v = (v → u), andw′ has at most one incoming alternation edge, then:

• If for each v, PartOrder(v, e′v) = PartOrder(v, ev)+1, then we may delete each edge, e′v, and add asingle replacement edge, e, from w to u and letOrd(e) > Ord(e′) for all other construction and syntax

edges, e′ outgoing from any w′ where w?=⇒ w′.

Intuitively, if every class which has w as a part has uas a part immediately after w, then we may removethe u part from all of those classes and instead makeu the last part of class w. See, for example, Figure 3.

• If for each v, PartOrder(v, e′v) = PartOrder(v, ev)−1, then we may delete each edge, e′v, and add asingle replacement edge, e, from w to u and letOrd(e) < Ord(e′) for all other construction and syntax

edges, e′ outgoing from any w′ where w?=⇒ w′.

Intuitively, if every class which has w as a part has uas a part immediately before w, then we may removethe u part from all of those classes and instead makeu the first part of class w.

B.4. Unnesting of Parts

Given a vertex, w ∈ V with no incoming alternationedges and an outgoing construction edge or syntax edge,e, with target u:

• If for every construction or syntax edge, e′ ≠ e, with

source w′ such that w?=⇒ w′, w′ has at most one

incoming alternation edge and Ord(e) > Ord(e′) (so eis the last part of every w object), then we may deleteedge e, if for each construction from some vertex, v,to w, ev, we add a replacement edge, e′v, from v to usuch that PartOrder(v, e′v) = PartOrder(v, ev) + 1. Inother words, we remove the last part, p, from everyw object, and insert the part p just after the w part ofevery object that contains a w object.or

• If for every construction or syntax edge, e′ ≠ e, with

source w′ such that w?=⇒ w′, w′ has at most one

incoming alternation edge and Ord(e) < Ord(e′) (so eis the first part of every w object), then we may deleteedge e, if for each construction from some vertex, v,to w, ev, we add a replacement edge, e′v, from v to usuch that PartOrder(v, e′v) = PartOrder(v, ev)−1. Inother words, we remove the first part, p, from everyw object, and insert the part p just before the w partof every object that contains a w object.

This is the inverse of nesting of parts.

B.5. Addition of Lambda Parts

From any vertex, v ∈ V, an outgoing constructionedge, (v l−→ w) to a construction vertex, w ∈ VC, maybe added if w has no outgoing edges. An outgoing syntaxedge, (v → w) to a syntax vertex, w, may be added if wis the “empty string.”

B.6. Deletion of Lambda Parts

A construction edge whose target is a constructionvertex with no outgoing edges, or a syntax edge whosetarget is the “empty string” may be deleted. This is theinverse of addition of lambda parts.

B.7. Addition of Lambda Alternative

An alternation edge, (v =⇒w), may be added from analternation vertex, v ∈ VA to a construction vertex, w ∈VC if w has no outgoing edges, v has only one outgoingconstruction edge, (v l−→ v′), and the target, v′, of thatedge has an outgoing alternation edge, (v′ =⇒ v), back tov. See, for example, Figure 4.

B.8. Deletion of Lambda Alternative

An alternation edge, (v =⇒w), from a vertex, v ∈ VAto a construction vertex, w ∈ VC, may be deleted if w has

210 THEORY AND PRACTICE OF OBJECT SYSTEMS–1997

no outgoing edges, v has only one outgoing constructionedge, (v l−→ v′), and the target, v′, of that edge has anoutgoing alternation edge, (v′ =⇒ v), back to v. This is theinverse of addition of lambda alternative.

B.9. Insertion of Singleton Construction

A new construction vertex, v, with a single outgoingconstruction edge to a vertex, v′ ∈ V, may be added toa class graph, and any incoming construction edges atv′ may be rerouted to v. Incoming alternation edges atv′ may also by rerouted to v if the rerouting does notresult in the inheritance of additional parts (syntax orconstruction edges) at v. See, for example, Figure 5.

B.10. Deletion of Singleton Construction

If a class graph contains a construction vertex, v, withno inherited parts and a single outgoing edge to a vertex,v′ ∈ (V∪VS), then v may be deleted if all incoming edgesat v are rerouted to v′. This is the inverse of insertion ofsingleton construction.

B.11. Attribute to Subclass

If a class graph contains a construction vertex, v ∈VC, with an outgoing construction edge, (v l−→ w), to analternation vertex, w ∈ VA, then we may delete the con-struction edge from v to w and for each vertex, w′, suchthat (w =⇒w′) ∈ EA, we add a new construction vertex,v′, with an incoming alternation edge from v, (v =⇒ v′),and an outgoing construction edge, (v′ l−→ w′) to w′.Each of the new construction edges is mapped to thesame number (under Ord) as was the deleted construc-tion edge. Since v now has outgoing alternation edgesit becomes (by definition) an alternation vertex. See, forexample, Figure 6.

B.12. Subclass to Attribute

If a class graph contains alternation vertices, v, w ∈VA, such that there is a one-to-one correspondence be-tween the vertices, v′, where (v =⇒ v′) ∈ EA and the ver-tices, w′ where (w =⇒w′) ∈ EA, such that for each w′the corresponding v′ is a construction vertex with a sin-gle incoming edge, (v =⇒ v′), and a single outgoing edge,(v′ l−→ w′), then we may delete each such v′ along withits incoming and outgoing edges and add a new construc-tion edge, (v l−→ w), from v to w. Since v no longer hasany outgoing alternation edges it becomes (by definition)a construction vertex. This is the inverse of attribute tosubclass.

Appendix C

Primitive Transformations for Completeness Proof

Figures 20–28 show sequences of primitive transfor-mations that correspond to each of the equations in the

axiom system, F, except for the equations (α · φ) = φand (α+φ) = α which are not applicable since the regu-lar class graphs as defined here do not contain the emptylanguage, φ.

Notes

1. More generally, if two class graphs define a common sub-language(i.e., the intersection of the two languages is not empty) then a pro-gram written for one of the class graphs could be automaticallytransformed into a program for the other in such a way that the be-havior of the system is preserved for any input in the sub-language.

2. The required accessor methods for class Compound are not shown.

3. There may be more than one container for each object.

4. See Table 1.

5. The inclusion of concrete syntax in class graphs was inspired bythe class dictionaries of [23, 24]

References

[1] Abiteboul, S., and Hull, R. (1988). Restructuring hi-erarchical database objects. Theoretical Computer Sci-ence, 62:3–38.

[2] Ait-Kaci, H., and Nasr, R. (1986). Login: A logic pro-gramming language with built-in inheritance. Journalof Logic Programming, 3:185–215.

[3] Banerjee, J., Kim, W., Kim, H.-J., and Korth, H.F.(1987). Semantics and implementation of schemaevolution in object-oriented databases. In Proc. ofACM/SIGMOD Annual Conf. on Management of Data,pp. 311–322. ACM, ACM Press, December. SIGMODRecord, Vol. 16, No. 3.

[4] Barbedette, G. (1991). Schema modifications in thelispo2 persistent object-oriented language. In PierreAmerica, ed., European Conf. on Object-Oriented Pro-gramming, pp. 77–96, Geneva, Switzerland, July 1991.Springer-Verlag, Lecture Notes in Computer Science.

[5] Bergstein, P. L. (1991). Object-preserving class trans-formations. In Object-Oriented Programming Systems,Languages and Applications Conf., in Special Issueof SIGPLAN Notices, pp. 299–313, Phoenix, Arizona.ACM Press.

[6] Bergstein, P. L. (1994). Managing the Evolution ofObject-oriented Systems. Ph.D. Thesis, NortheasternUniversity, Boston, MA, June.

[7] Bergstein, P. L., and Hursch, W. L. (1993). Maintain-ing behavioral consistency during schema evolution. InS. Nishio and A. Yonezawa, eds., Int. Symp. on Ob-ject Technologies for Advanced Software, pp. 176–193,Kanazawa, Japan, November 1993. JSSST, Springer-Verlag, Lecture Notes in Computer Science. Also avail-able as Northeastern University, College of ComputerScience Technical Report Number NU-CCS-93-04.

[8] Bertino, E. (1992). A view mechanism for object-oriented databases. In Int. Conf. on Extending DatabaseTechnology, pp. 136–151, Vienna, Austria.

[9] Borgida, A., Mitchell, T., and Williamson, K. (1986).Learning improved integrity constraints and schemasfrom exceptions in data and knowledge bases. InMichael L. Brodie and John Mylopoulos, eds., On

THEORY AND PRACTICE OF OBJECT SYSTEMS–1997 211

Knowledge Base Management Systems, pp. 259–286.Springer-Verlag.

[10] Casais, E. (1991). Managing evolution in object-oriented environments: an algorithmic approach. Ph.D.Thesis, University of Geneva, Geneva, Switzerland,May. Thesis No. 369.

[11] Coen-Porisini, A., Lavazza, L., and Zicari, R. (1991).Updating the schema of an object-oriented database.Quarterly Bulletin of the IEEE Computer Society Tech-nical Committee on Data Engineering, 14(2):33–37,June. Special Issue on Foundations of Object-OrientedDatabase Systems.

[12] Delcourt, C., and Zicari, R. (1991). The design of anintegrity consistency checker (icc) for an object ori-ented database system. In Pierre America, ed., Euro-pean Conf. on Object-Oriented Programming, pp. 97–117, Geneva, Switzerland, July. Springer-Verlag, Lec-ture Notes in Computer Science.

[13] Dershowitz, N., and Manna, Z. (1977). The evolutionof programs: Automatic program modification. IEEETransactions on Software Engineering, SE-3(6):377–385, November.

[14] Gentner, D. (1983). Structure-mapping: A theoreticalframework for analogy. Cognitive Science, 7:155–170.

[15] Gentner, D., and Toupin, C. (1986). Systematicity andsurface similarity in the development of analogy. Cog-nitive Science, 10:277–300.

[16] Hull, R. B., and Yap, C. K. (1984). The format model: Atheory of data organization. Journal of the Associationfor Computing Machinery, 31(3):518–537, July.

[17] Hursch, W. L., Seiter, L. M., and Xiao, C. (1991). In anyCASE: Demeter. The American Programmer, 4(10):46–56, October.

[18] Johnson, R. E., and Foote, B. (1988). Designingreusable classes. Journal of Object-Oriented Program-ming, 1(2):22–35, June/July.

[19] Johnson, R. E., and Opdyke, W. F. (1993). Refactor-ing and aggregation. In S. Nishio and A. Yonezawa,eds., Int. Symp. on Object Technologies for AdvancedSoftware, pp. 264–278, Kanazawa, Japan, November1993. JSSST, Springer-Verlag, Lecture Notes in Com-puter Science.

[20] Kuper, G. M., and Vardi, M. Y. (1984). The logical datamodel. In Principles of Database Systems, pp. 86–96.ACM.

[21] Lerner, B. S., and Habermann, A. N. (1990). Be-yond schema evolution to database reorganization. InNorman Meyrowitz, ed., Proc. OOPSLA ECOOP ’90,pp. 67–76, Ottawa, Canada, October. ACM, ACMPress. Special Issue of SIGPLAN Notices, Vol. 25,No. 10.

[22] Li, Q., and McLeod, D. (1994). Conceptual databaseevolution through learning in object databases. IEEETransactions on Knowledge and Data Engineering,6(2):205–224.

[23] Lieberherr, K. J. (1988). Object-oriented programmingwith class dictionaries. Journal on Lisp and SymbolicComputation, 1(2):185–212.

[24] Lieberherr, K. J., and Riel, A. J. (1988). Demeter: A

CASE study of software growth through parameter-ized classes. Journal of Object-Oriented Programming,1(3):8–22, August, September. A shorter version ofthis paper was presented at The 10th Int. Conf. on Soft-ware Engineering, Singapore, April 1988, IEEE Press,pp. 254–264.

[25] Lieberherr, K. J., Bergstein, P. L., and Silva-Lepe, I.(1991). From objects to classes: Algorithms for object-oriented design. Journal of Software Engineering,6(4):205–228, July.

[26] Meyer, B. (1988). Object-Oriented Software Construc-tion. Series in Computer Science. Prentice Hall Inter-national.

[27] Opdyke, W. F. (1992). Refactoring: A Program Restruc-turing Aid in Designing Object-Oriented ApplicationFrameworks. Ph.D. Thesis, Computer Science Depart-ment, University of Illinois, May.

[28] Opdyke, W. F., and Johnson, R. E. (1990). Refactor-ing: An aid in designing application frameworks andevolving object-oriented systems. In Proc. of the Symp.on Object-Oriented Programming Emphasizing Practi-cal Applications (SOOPA), pp. 145–160, Poughkeepsie,NY, September. ACM.

[29] Opdyke, W. F., and Johnson, R. E. (1993). Creatingabstract superclasses by refactoring. In Proc. of theCSC ’93: The ACM 1993 Computer Science Confer-ence, February.

[30] Penney, J. D., and Stein, J. (1987). Class modifica-tion in the GemStone object-oriented DBMS. In Nor-man Meyrowitz, ed., Object-Oriented ProgrammingSystems, Languages and Applications Conf., in Spe-cial Issue of SIGPLAN Notices, pp. 111–117, Orlando,Florida, December 1987. ACM, ACM Press. SpecialIssue of SIGPLAN Notices, Vol. 22, No. 12.

[31] Pernici, B., Barbic, F., Fugini, M. G., Maiocchi, R.,Rames, J. R., and Rolland, C. (1989). C-TODOS:An automatic tool for office system conceptual de-sign. ACM Transactions on Office Information Systems,7(4):378–419, October.

[32] Salomaa, A. (1969). Theory of Automata. InternationalSeries of Monographs in Pure and Applied Mathemat-ics, Vol. 100. Pergamon Press.

[33] Silva-Lepe, I., Hursch, W., and Sullivan, G. (1994). AReport on Demeter/C++. C++ Report, pp. 24–30,February.

[34] Skarra, A. H., and Zdonik, S. B. (1986). The manage-ment of changing types in an object-oriented database.In Object-Oriented Programming Systems, Languagesand Applications Conf., in Special Issue of SIGPLANNotices, pp. 483–495. ACM, ACM Press, September.

[35] Snodgrass, R. (1989). The interface description lan-guage. Computer Science Press.

[36] Teorey, T. J., Yang, D., and Fry, J. P. (1986). A logicaldesign methodology for relational data bases. ACMComputing Surveys, 18(2):197–222, June.

[37] Tsichritzis, D., and Lochovsky, F. (1982). Data Models.Software Series. Prentice-Hall.

[38] Ulrich, J. W., and Moll, R. (1977). Program synthesisby analogy. SIGPLAN Notices, 12(8): 22–28, August.

212 THEORY AND PRACTICE OF OBJECT SYSTEMS–1997