abstraction techniques in modern programming languages

16
Abstraction Techniques in Modern Programming Languages Mary Shaw, Carnegie-Mellon University The major issues of modern soft- ware engineering arise from the costs of software development, use, and maintenance-which are too high-and the quality of the systems- which is too low. These problems are particularly severe for today's large complex programs with long useful lifetimes. This article traces important programming language ideas back to their roots in the problems and languages of the 1970's, and it shows how modern programming languages respond to the complexity of contem- porary software development. Mod- ern programming's key concept for controlling complexity is abstrac- tion-that is, selective emphasis on detail. The effects of abstraction tech- niques and associated specification and verification issues run through the history of attempts to solve the prob- lems of high cost and low quality. The best new developments in program- ming languages support and exploit abstraction techniques. These tech- niques emphasize engineering con- cerns, including design, specification, correctness, and reliability. We begin by reviewing the ideas about program development and anal- ysis that have heavily influenced the development of current programming language techniques. Many of these ideas are currently interesting as well as historically important. We then survey This article is an update and revision of "The Impact of Abstraction Concerns on Modern Programming Languages," which appeared in Proceedings of the IEEE, V ol. 68, No. 9, Sept. 1980, pp. I1 19-1130. the ideas from recent research projects that are influencing modern software practice. The changes in program organization that have been stimulated by these ideas are illustrated by developing a small example in three different languages-Fortran, Pascal, and Ada. Finally, we assess the status and the potential of current abstrac- tion techniques. Conceptual and historical review Controlling software development and maintenance has always involved managing the intellectual complexity of programs and systems of programs. Not only must the systems be created, they must be tested, maintained, and extended. As a result, many different people must understand and modify the systems at various times during their lifetimes. Abstraction provides a good way to manage complexity and guarantee continuity. An abstraction is a simplified de- scription, or specification, of a system that emphasizes some of the system's details or properties while suppressing others. A good abstraction is one that emphasizes details that are significant to the reader or user and suppresses details that are, at least for the mo- ment, immaterial or diversionary. "Abstraction" in programming sys- tems corresponds closely to "analytic modeling" in many other fields. Con- struction of a model usually starts with observations, followed closely by for- mation of hypotheses about principles or axioms that explain the observa- tions. These axioms are used to derive or construct a model of the observed 0740-7459/84/0010/0010501.00 ( 1984 IEEE 'N IEEE SOFTWARE

Upload: hoangcong

Post on 11-Feb-2017

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Abstraction Techniques in Modern Programming Languages

Abstraction Techniquesin Modern ProgrammingLanguages

Mary Shaw, Carnegie-Mellon University

The major issues of modern soft-ware engineering arise from the

costs of software development, use,and maintenance-which are toohigh-and the quality of the systems-which is too low. These problems are

particularly severe for today's largecomplex programs with long usefullifetimes. This article traces importantprogramming language ideas back totheir roots in the problems andlanguages of the 1970's, and it showshow modern programming languagesrespond to the complexity of contem-porary software development. Mod-ern programming's key concept forcontrolling complexity is abstrac-tion-that is, selective emphasis on

detail.The effects of abstraction tech-

niques and associated specificationand verification issues run through thehistory of attempts to solve the prob-lems of high cost and low quality. Thebest new developments in program-ming languages support and exploitabstraction techniques. These tech-niques emphasize engineering con-

cerns, including design, specification,correctness, and reliability.We begin by reviewing the ideas

about program development and anal-ysis that have heavily influenced thedevelopment of current programminglanguage techniques. Many of theseideas are currently interesting as well as

historically important. We then survey

This article is an update and revision of "TheImpact of Abstraction Concerns on ModernProgramming Languages," which appeared inProceedings of the IEEE, V ol. 68, No. 9, Sept.

1980, pp. I1 19-1130.

the ideas from recent research projectsthat are influencing modern softwarepractice. The changes in program

organization that have been stimulatedby these ideas are illustrated bydeveloping a small example in threedifferent languages-Fortran, Pascal,and Ada. Finally, we assess the statusand the potential of current abstrac-tion techniques.

Conceptual and historicalreview

Controlling software developmentand maintenance has always involvedmanaging the intellectual complexityof programs and systems of programs.Not only must the systems be created,they must be tested, maintained, andextended. As a result, many differentpeople must understand and modifythe systems at various times duringtheir lifetimes. Abstraction provides a

good way to manage complexity andguarantee continuity.An abstraction is a simplified de-

scription, or specification, of a systemthat emphasizes some of the system'sdetails or properties while suppressingothers. A good abstraction is one thatemphasizes details that are significantto the reader or user and suppresses

details that are, at least for the mo-

ment, immaterial or diversionary."Abstraction" in programming sys-

tems corresponds closely to "analyticmodeling" in many other fields. Con-struction of a model usually starts withobservations, followed closely by for-mation of hypotheses about principlesor axioms that explain the observa-tions. These axioms are used to deriveor construct a model of the observed

0740-7459/84/0010/0010501.00 ( 1984 IEEE

'N

IEEE SOFTWARE

Page 2: Abstraction Techniques in Modern Programming Languages

system. The parameters or variables ofthe model may be derived from the ax-ioms or they may be estimated fromobservation. The model is then used tomake new predictions. The final step isto perform experiments in controlledor well-understood environments todetermine the accuracy and robustnessof the model and of the axioms. Thiscycle of hypothesizing and validatingmodels is then continued with addi-tional observations.

In software development, the re-quirements or intended functionalityof a system play the role of the obser-vations to be explained. The abstrac-tion process is then very similar to thegeneral modeling paradigm: decidingwhich characteristics of the system areimportant, what parameters should beincluded, which descriptive formalismto use, how the model can be vali-dated, and so on. As in many otherfields, we often define hierarchies ofmodels in which lower level modelsprovide more detailed explanations forthe phenomena that appear in higherlevel models. Also, in computer sci-ence, as in other fields, the model issufficiently different from the systemit describes to require explicit valida-tion. We refer to the abstract descrip-tion provided by a model as its speci-fication and to the next lower levelmodel in the hierarchy as its im-plementation. The process of deter-mining that the specification is consis-tent with the implementation is calledverification. The abstractions we usefor software tend to emphasize thefunctional properties of the soft-ware-what is computed rather thanhow the computation is carried out.

Abstraction techniques have evolvedin step not only with our understandingof programming issues, but also withour ability to use the abstractions asformal specifications of the systemsthey describe. In the 1960's, for exam-ple, the important developments inmethodology and languages centeredon functions and procedures, whichsummarized a program segment interms of a name and a parameter list.At that time, we only knew how to per-form syntactic validity checks, andspecification techniques reflected this:

"specification" meant little more than"procedure header" until late in thedecade. By the late 1970's, develop-ments centered on the design of datastructures; specification techniquesdrew on quite sophisticated techniquesof mathematical logic; and program-ming language semantics were wellenough understood to permit formalverification that a program was consis-tent with its specification.

Early techniques. Prior to the late1960's, the dominant programming-language issues were syntax, transla-tion techniques, and solutions tospecific implementation problems.Thus, we saw many articles on specificproblems, such as parsing, storageallocation, and data represention.Procedures were well understood, andlibraries of procedures were set up, butthey were only partly sucessful becauseoften the documentation (informalspecification) was inadequate, orbecause the parameterization of theprocedures did not support the cases ofinterest. Basic data strutures such asstacks and linked lists were just begin-ning to be understood, but they weresufficiently unfamiliar that it was dif-ficult to separate the concepts from theparticular implementations. Perhaps itwas too early in the history of the fieldfor generalization and synthesis totake place, but, in any event, abstrac-tion played only a minor role.The earliest application of ab-

straction to the design of program-ming languages may have been thesymbolic assemblers of the 1950's. In-stead of writing programs directly inoctal codes, mnemonic names wereused to stand for numeric operationcodes, and the binding of variablenames to specific machine locationswas delegated to the assemblers.

Beginning in the late 1960's, ab-straction was treated consciously as aprogram organization technique.Earlier languages supported built-indata types, including at least integers,real numbers, and arrays, and some-times Booleans, high-precision reals,etc. Data structures were first treatedsystematically in 1968,1 and the notionthat a programmer might define data

types tailored to a particular problemfirst appeared in 1967. The notion thatprogramming is an activity that shouldbe studied and subjected to some sortof discipline dates to the NATO Soft-ware Engineering Conferences of19682 and 1969.3

Extensible languages. The late1960's also saw efforts to abstractfrom the built-in notations of pro-gramming languages in such a waythat any programmer could add newnotation and new data types to a baselanguage. The objectives of this exten-sible language work included allowingindividual programmers to extend thesyntax of the programming language,to define new data structures, to addnew operators (including infix oper-ators as well as ordinary functions) forboth old and new data structures, andto add new control structures to thebase language.

This work on extensibility4 diedout, in part because it underestimatedthe difficulty of defining interestingextensions. It was difficult to keep in-dependent extensions compatiblewhen all of them modified the syntaxof the base language, to organizedefinitions so that related informationwas grouped in common locations,and to find techniques for describingan extension accurately (other than byexhibiting the code for the extension).However, extensible languages influ-enced the abstract data types and ge-neric definitions of the 1970's-ab-stract data types that extended thesemantics, rather than the syntax of alanguage, and generic definitions thatprovided some of the operator defini-tion facilities that extensible languageswere trying to provide.

Structured programming. By theearly 1970's, a methodology emergedfor constructing programs by pro-gressing from a statement of the objec-tive through successively more preciseintermediate stages to final code.Called "stepwise refinement" or "top-down programming," this method-ology involved beginning programdevelopment with a version that wasfree to assume the existence of any

October 1984 11

Page 3: Abstraction Techniques in Modern Programming Languages

Abstraction in mapmakingSimplification of reality is essential to

good mapmaking. Like the programmer,the mapmaker must abstract available in-formation, selectively emphasizing andsuppressing detail. Picking the "right" in-formation- features as well as scale-is amajor part of his art. The effect of his inten-

data structures and operations thatcould be directly applied to the prob-lem at hand, even if those structuresand operations were quite sophis-ticated and difficult to implement.Thus the initial program was presum-ably small, clear, directly problem-related, and "obviously" correct.Although the assumed structures andoperations might be specified only in-formally, the programmer's intuitionsabout them made it possible to concen-trate on the overall organization of theprogram and defer concerns about theimplementations of the assumed struc-tures and operators. When each of thelatter definitions was addressed, thesame technique was applied again, andthe implementations of the high-leveloperations were substituted for thecorresponding invocations. The resultwas a new, more detailed program thatwas convincingly like the previous one,but that depended on fewer conceptsor on simpler definitions. Since thetranslation to simpler terms eliminatedproblem-specific knowledge in favorof more universal operators, the newprogram was more nearly compilablethan its predecessor. Successive stepsof the program development addeddetails more relevant to the program-ming language than to the problem do-main until the program was complete-ly expressed using the operations and

tions become especially apparent in viewingdifferent maps of the same area-in thiscase, Washington, DC.One of the earliest maps of Washington,

produced in the 1790's, is the handkerchiefmap (p. 10) of Maj. Pierre Charles L'En-fant's plan for the city. The same area,treated in a different manner, can be seen in

data types of the base language, forwhich a translator or interpreter wasavailable.

This separation of concerns betweenthe structures that were used to solve aproblem and the way those structureswere implemented aided in decompos-ing complex problems into smaller,fairly independent segments. The keyto the success of the methodology wasthe degree of abstraction imposed byselecting high-level data structures andoperations. The chief limitation of themethodology, which was not appreci-ated until it had been in use for sometime, was that the final program didnot preserve the series of abstractionsthrough which it was created, and sothe task of modifying the program af-ter it was completed was not neces-sarily simpler than it would be for aprogram developed in any other way.Another limitation of the methodol-ogy was that informal descriptions ofoperations did not convey precise in-formation. Misunderstandings aboutexactly what an operation was sup-posed to do could complicate the pro-gram development process, and in-formal descriptions of proceduresmight not have been adequate toassure true independence of modules.The development of techniques forformal program specification helpedalleviate this set of problems.

the four maps on these pages. The twomaps on p. 12 show the considerable varia-tion in road/street maps. One emphasizesthe highway system; the other, one of a setof visitor maps by the American Institute ofArchitects, reduces the Capitol Hill area toa schematic block pattern. The aviationchart (p. 13, left) locates radio beacons and

At about the same time as stepwiserefinement was emerging, we also be-gan to be concerned about how peopleunderstood programs and how pro-grams could be organized to makethem easier to understand, and henceto modify. For programs written ingeneral-purpose programming lan-guages, this understanding was basedprimarily on the program state, that is,on the current values of all the vari-ables in the program at some instant. Itwas of primary importance to be ableto determine what assumptions aboutthe program state were being made atany point in the program. Further, ar-bitrary transfers of control, especiallythose that spanned large amounts ofprogram text, interfered with this goal.The control flow patterns that lentthemselves to understandable pro-grams were the ones that had a singleentry point (at the beginning of thetext) and, at least conceptually, a singleexit point (at the end of the text). Ex-amples of statements that satisfied thisrule were the if. . .then. . .else formof conditional and the for and whileloops. The chief violator of the rulewas the go to statement.

The first discussion of how to makeprograms easier to understand ap-peared in 1968, and we converged on acommon set of "ideal" control con-structs a few years later. Although we

IEEE SOFTWARE12

Page 4: Abstraction Techniques in Modern Programming Languages

airplane routes. The black crosshairs we'veadded indicate Washington National Air-port; the shaded segment to the north is theCapitol Hill area. Topographic maps(right) emphasize surface details, especiallyelevations. Produced by the US GeologicalSurvey, they meet high standards for physi-cal accuracy.

still have not achieved a true consensuson this set of constructs, we no longerregard the question as a major issue.

Program verification. In parallelwith the development of "ideal" con-trol constructs-in fact, as part of themotivation-computer scientists be-came interested in finding ways tomake precise, mathematically manip-ulatable statements about what a pro-gram computes. The ability to makesuch statements was essential to thedevelopment of techniques for reason-ing about programs, particularly fortechniques that relied on abstractspecifications of effects. New tech-niques were required because programtext alone failed to provide adequateinformation for reasoning preciselyabout programs. Procedure headers,even accompanied by prose commen-tary, were imprecise. This imprecisionleads to ambiguities about respon-sibilities for the computation and to in-adequate separation between modules.

The notion that it was possible tomake formal statements about theprogram state, and to reason rigorous-ly about the effect on the programstate of executing a statement, first ap-peared in the late 1960's. The formalstatements were expressed as formulasin the predicate calculus, such as

y>xA (x>ODz=x2)A programming language was de-scribed by a set of rules that definedthe effect each statement had on thelogical formula that described the pro-gram state. The rules for the languagewere applied to the program assertionsto obtain theorems whose proofsassured that the program matched thespecification. By the early 1970's, thebasic concepts of verifying assertionsabout simple programs and describingthe language in such a way that thiswas possible were well understood.Manual application of verificationtechniques tended to be error-prone,and formal specifications, as much asinformal ones, were susceptible to er-rors of omission. Verification requiredconverting a program annotated withlogical assertions into logical theoremswith the property that the program iscorrect if and only if the theorems weretrue. This conversion process, calledverification condition generation, isnow well understood, and programshave been developed to perform thesesteps automatically. Programs are alsobeing developed to prove the resultingtheorems, but considerable work re-mains to be done on this problem.

When the emphasis in programmingmethodology shifted to using datastructures as a basis for program or-

ganization, corresponding specifica-tion and verification problems arose.Initially, the question was what infor-mation it was useful to specify. Subse-quently, attention focused on makingthose specifications more formal anddealing with the verification problems.From this basis, work on verificationfor abstract data types proceeded as wedescribe below.

Abstract data types. In the 1970'swe recognized the importance oforganizing programs into modules insuch a way that knowledge about im-plementation details was localized asmuch as possible. This led to languagesupport for data types, for specifica-tions that are organized using the samestructure as data, and for genericdefinitions. The language facilitieswere based on the class construct ofSimula, on ideas about strategies fordefining modules, and on concernsover the impact of locality on programorganization. The correspondingspecification techniques includedstrong typing and verification of asser-tions about functional correctness.

Later in the 1970's, most researchactivity in abstraction techniques fo-cused on the language and specifica-tion issues raised by these considera-tions; much of the work is identifiedwith the concept of abstract data

October 1984 13

Page 5: Abstraction Techniques in Modern Programming Languages

types. Like structured programming,the methodology of abstract data typesemphasized locality of related collec-tions of information. In this case, datawas emphasized rather than control,and the strategy was to package eachdata structure and its associated opera-tions in a single module. The resultingmodule contained the informationnecessary to treat the data structureand its operations as a type. The objec-tive was to treat such modules in thesame way as ordinary types such asintegers and reals were treated; this re-quired support for declarations, infixoperators, specification of routineparameters, and so on. The result, theabstract data type, effectively ex-tended the set of types available to aprogram: it explained the properties ofa new group of variables by specifyingthe values one of these variables mighthave, and it explained the operationsthat would be permitted on the newvariables by giving the effects theseoperations had on the values of thevariables.

In a data type abstraction,we need separate specification

and implementation.

In a data type abstraction, we nowrecognize the need for separatespecification and implementation.First, we specify the functional proper-ties of a data structure and its opera-tions, then we implement them interms of existing language constructs(and other data types) and show thatthe specification is accurate. When wesubsequently use the abstraction, wedeal with the new type solely in termsof its specification. This philosophywas developed in several recent lan-guage research and development proj-ects, including Ada, Alphard, CLU,Concurrent Pascal, Euclid, Gypsy,Mesa, and Modula.The specification techniques we use

for abstract data types evolved fromthe predicates in simple sequential pro-grams. Additional expressive powerwas incorporated to deal with the wayinformation is packaged into modules

and with the problem of abstractingfrom an implementation to a datatype. One class of specification tech-niques draws on the similarity betweena data type and the mathematicalstructure called an algebra. Anotherclass of techniques explicitly models anewly defined type by defining its pro-perties in terms of the properties ofcommon, well-understood types.

In conjunction with the work onabstract data types and formal specifi-cations, the generic definitions thatoriginated in extensible languages havebeen developed to a level of expressive-ness and precision far beyond the an-ticipation of their originators. Thesedefinitions, discussed in detail below,are parameterized not only in terms ofvariables that can be manipulated dur-ing program execution, but also interms of data types. They can nowdescribe restrictions on which types areacceptable parameters in considerabledetail. 5

Interactions. As this review hasshown, programming languages andmethodologies evolve in response tothe perceived needs of software design-ers and implementors. However, theseneeds themselves evolve in response toexperience gained with past solutions.The original abstraction techniques ofstructured programming and stepwiserefinement were procedures or macros;these techniques have evolved toabstract types and generic definitions.(Although procedures were originallyviewed as devices to save code space,they soon came to be regarded, likemacros, as abstraction tools.) Metho-dologies for program developmentemerge when we find common usefulpatterns and try to use them as models.Languages evolve to support thesemethodologies when the modelsbecome so common and stable thatthey are regarded as standard. Asabstraction techniques have becomecapable of addressing a wider range ofprogram organizations, formal speci-fication techniques have become moreprecise and have played a more impor-tant role in the programming process.

For an abstraction to be used effec-tively, its specification must express all

the information needed by the pro-grammer who uses it. Initial attemptsat specification used the notation ofthe programnming language to expressthings that could be checked by thecompiler: the name of a routine andthe number and types of its parame-ters. Other facts, such as what the rou-tine computed and under what condi-tions it should be used, were expressedinformally.6 We have now progressedto the point that we can write precisedescriptions of many important rela-tions among routines, including theirassumptions about the values of theirinputs and the effects they have on theprogram state.

The history of programming lan-guages shows a balance betweenlanguage ideas and formal techniques;in each methodology, language con-structs support the programmingtechniques we understand well, andthe properties we specify are matchedto our current ability to validate(verify) the consistency of a specifica-tion and its implementation. We canrely on formal specifications only tothe extent that we are certain that theymatch their implementations. Thus,the development of abstraction techni-ques, specification techniques, andmethods of verifying the consistencyof a specification and an implementa-tion must surely proceed hand in hand.In the future, we should expect to seemore diversity in the programs that areused as a basis for modularization. Weshould also expect to see specificationsthat are concerned with aspects of pro-grams other than the purely functionalproperties we now consider.

Abstraction facilities inmodern programminglanguagesWith this historical background, we

now turn to the abstraction methodol-ogies and specification techniques thatare currently under development in theprogramming language research com-munity. Some of the ideas are wellenough worked out to be ready fortransfer to practical languages, butothers are still under development.

Although the ideas behind modernabstraction techniques can be explored

IEEE SOFTWARE14

Page 6: Abstraction Techniques in Modern Programming Languages

independently of programming lan-guages, the instantiation of these ideasin actual languages is also important.Programming languages are our pri-mary notational vehicle for expressinga class of very complex ideas. The con-cepts we must deal with include notonly the functional relations of mathe-matics, but also constructs that dealwith relations over time, such as se-quentiality and synchronization. Lan-guage designs influence the ways wethink about algorithms by makingsome program structures easier to de-scribe than others. In addition, pro-gramming languages are used for com-munication among people as well asfor controlling machines. This role isparticularly important in long-livedprograms, because a program is inmany ways the most practical mediumfor expressing the structure imposedby the designer-and for maintainingthe accuracy of this documentationover time. Thus, even though mostprogramming languages technicallyhave the same expressive power, dif-ferences among languages can sig-nificantly affect their practical utility.

New ideas. Current activity in pro-gramming languages is driven by threeglobal concerns: simplicity of design,the potential for applying preciseanalytic techniques to formal speci-fications, and the need to control costsover the entire lifetime of a long-livedprogram.

Simplicity of design. Simplicity hasemerged as a major criterion for evalu-ating programming language designs.We see a certain tension between theneed for "just the right construct" fora task and the need for a languagesmall enough to understand thorough-ly. This is an example of a trade-offbetween specialization and generality:if highly specialized constructs are pro-vided, individual programs will besmaller, but at the expense of complex-ity (and feature-by feature interac-tions) in the system as a whole. Thecurrent trend is to provide a relativelysmall base language that provides waysto define special facilities in a regularway. 7

Software development techniquesThe software development methods of the early 1970's concentrated

on program organization and on disciplines for programmers. As timehas passed, it has become clear that software support for the program-ming process itself is important.

General issues of software development, including both managementand implementation issues, are discussed in Brook's very readablebook.' The philosophy of structured programming and the principles ofdata organization that underlie the representation issues of abstractdata types have received careful technical treatment. 24 The proceedingsof the Conference on Specifications of Reliable Software contain paperson both prose descriptions of requirements and mathematical specifica-tion of abstractions.5

Step-wise refinement is a method of constructing programs by pro-gressing from a statement of the objective through successively moreprecise intermediate stages to final code.3,6 Structured programming isa discipline for writing programs using control constructs that lead toeasily understandable code. The first discussion of this question ap-peared in 1968.7 We converged on a common set of "ideal" control con-structs a few years later.3,8 This set of control constructs has the prop-erty that each unit of control has unique entry and exit points; as aresult, it's relatively easy to discover what assumptions about the pro-gram state are being made at any point. A similar argument about locali-ty of data access9 helped to focus the role of scope rules in construct-ing understandable programs.

Software environments for program development are beyond thescope of this article, but the state of the work in early 1984 is capturedin the proceedings of the ACM Symposium on Practical Software En-vironments l0and a recent IEEE Software article on the Cedar environ-ment. 1 1

References1. F. P. Brooks, Jr., The Mythical Man-Month: Essays on Software Engineering,

Addison-Wesley, Reading, Massachusetts, 1975.2. 0. -J. Dahl and C. A. R. Hoare, 'Hierarchical Program Structures," Structured

Programming, Academic Press, 1972, pp. 175-220.3. E. W. Dijkstra, "Notes on Structured Programming," Structured Program-

ming, Academic Press, 1972, pp. 1-82.4. C. A. R. Hoare, "Notes on Data Structuring," Structured Programming,

Academic Press, 1972, pp. 83-174.5. IEEE Proc. Conf. Spec. Reliable Software, 1979.6. N. Wirth, "Program Development by Stepwise Refinement," Comm. ACM,

Vol. 14, No. 4, Apr. 1971.7. E. W. Dijkstra, "Goto Statement Considered Harmful," Comm. ACM, Vol. 11,

No. 3, Mar. 1968.8. C. A. R. Hoare and N. Wirth, "An Axiomatic Definition of the Programming

Language Pascal," Acta Informatica, Vol. 2, No. 4,1973.9. W. A. Wulf and M. Shaw, "Global Variables Considered Harmful," ACM

SIGPLAN Notices, Vol. 8, Feb. 1973.10. Proc. ACM SIGSOFTISIGPLAN Software Eng. Symp. Prac. Software Develop-

ment Environments, 1984.11. Warren Teitelman, "A Tour Through Cedar," IEEE Software, Vol. 1, No. 2, Apr.

1984, pp. 44-73.

October 1984 15

Page 7: Abstraction Techniques in Modern Programming Languages

An emphasis on simplicity underliesa number of design criteria that arenow commonly used. When programsare organized to localize information,for example, assumptions sharedamong program parts and module in-terfaces can be significantly simplified.The introduction of support for ab-stract data types allows programmersto design special-purpose structuresand deal with them in a simple way; itdoes so by providing a definition facil-ity that allows the extensions to bemade in a regular, predictable fashion.The regularity introduced by usingthese facilities can substantially reducemaintenance problems by making iteasier for a programmer who is un-familiar with the code to understandthe assumptions about the programstate-that are made at a given point inthe program-thereby increasing theodds that he or she can avoid introduc-ing new errors with each change.

Formal and quantitative tech-niques. Our understanding of the prin-ciples underlying programming lan-guages has improved to the point thatformal and quantitative techniques areboth feasible and useful. We discusslater current methods for specifyingproperties of abstract data types andfor verifying that those specificationsare consistent with the implementa-tion. It is perhaps not surprising thatthere seems to be a strong correlationbetween the ease of writing proof rulesfor language constructs and the easewith which programmers can use thoseconstructs correctly and understandprograms that use them.

Lifetime costs. In the 1970's webegan to appreciate that the cost ofsoftware includes the costs over thelifetime of the program, and not justthe costs of initial development or ofexecution. For large, long-lived pro-grams, the costs of enhancement andmaintenance usually dominate design,development, and execution costs,often by large factors. These cost con-siderations raise two issues. 8 First, tomodify a program successfully, a pro-grammer must be able to determine

what other portions of the programdepend on the section about to bemodified. This is simplified if theinformation is localized and if thedesign structure is retained in the struc-ture of the program. Off-line designnotes or other documents are not anadequate substitute except in the un-

likely case that they are meticulouslyand correctly updated. Second, largeprograms rarely exist in only one ver-sion. On the contrary, they exist in se-quential versions as improvements areadded from time to time, and they mayalso exist in simultaneous versions fordifferent machines or machine con-

Formal specificationOne of the major themes of this article is that formal analytic

methods are integral to good methodology. Formal specification techni-ques for software and their associated verification techniques form animportant segment of the new analytic methods.The basic verification techniques depend on making formal asser-

tions about the computation performed by the program. Under certaincircumstances it is possible to verify that these assertions are true. Lon-don surveys these ideas.1 Manna2 and Wulf3 also offer introductions tothe methods.

By the early 1970's the basic concepts of verifying assertions aboutsimple programs and describing a language in such a way that this ispossible were under control.4,5 When manually applied, verificationtechniques tend to be error-prone, and formal specifications, like infor-mal ones, are susceptible to errors of omission.6 In response to thisproblem, systems for automatically performing the verification stepshave been developed.7As abstract data types emerged, so did specification and verification

issues. The initial efforts addressed the question of what information isuseful in a specification.8 Subsequent attention concentrated on mak-

References

1. R. L. London, "A View of Program Verification," Proc. IEEE Int'l Conf. ReliableSoftware, Apr. 1975, pp. 534-545.

2. Z. Manna, Mathematical Theor,y of Computation, McGraw-Hill, 1974.

3. W. A. Wulf et al, Fundamental Structures of Computer Science, Addison-Wesley, 1981.

4. C. A. R. Hoare and N. Wirth, "An Axiomatic Definition of the ProgrammingLanguage Pascal," Acta Informatica, Vol. 2, No. 4,1973.

5. R. L. London et al., 'Proof Rules for the Programming Language Euclid," ActaInformatica, Vol. 10, No. 1,1978, pp. 1-26.

6. S. L. Gerhart and L. Yelowitz, 'Observations of Fallibility in Applications of Mo-dern Programming Methodologies," IEEE Trans. Software Eng., Vol. SE-2, No.5, Sept. 1976.

7. S. L. Gerhart and D. S. Wile, "Preliminary Report on the Delta Experiment:Specification and Verification of a Multiple-User File Updating Module," Proc.IEEE Conf. Spec. Reliable Software, 1979, pp. 198-211.

8. D. L. Parnas, "A Technique for Software Module Specification with Examples,"Comm. ACM, Vol. 15, May 1972.

9. C. A. R. Hoare, "Proof of Correctness of Data Representations," Acta Infor-matica, Vol. 1, No. 4,1972.

10. J. V. Guttag, E. Horowitz and D. R. Musser, "Abstract Data Types and SoftwareValidation," Comm. ACM, Vol. 21, No. 12, Dec. 1978.

11. B. H. Liskov and S. N. Zilles, "Specification Techniques for Data Abstractions,"IEEE Trans. Software Eng., Vol. SE-1, Mar. 1975.

12. J. H. Morris, 'Types Are Not Sets," Proc. ACM Symp. Prin. Prog. Lang., 1973, pp.120-124.

IEEE SOFTWARE16

Page 8: Abstraction Techniques in Modern Programming Languages

figurations. When many versions andmany programmers are involved, themajor issues are problems of manage-ment, not of programming. Neverthe-less, software tools can significantlyease the problems, and tools for man-aging the interactions among manyversions of a program are included in

modern integrated programming en-vironments.

Language support. As we have dis-cussed, the major thrust of program-ming language research activity in the1970's was to explore the issues relatedto abstract data types. The method-

ing those specifications more formal and dealing with the verificationproblems.9 A debate on the nature of types led to the view that typesshare the formal characteristics of abstract algebras. 10 12 Another classof techniques explicitly models a newly defined type by defining itsproperties in terms of the properties of common, well-understoodtypes.13 More recently, a specification method that draws on both view-points has emerged. 14 Strategies for designing data types 15 and for us-ing specifications in the design process16 were also explored.A certain amount of work on formal specification and verification of

properties other than computational functionality has already beendone. Most of it is directed at specific properties rather than at tech-niques that can be applied to a variety of properties; the results are,nevertheless, interesting. The need to address a variety of requirementsin practical real-time systems was vividly demonstrated at the Con-ference on Specifications of Reliable Software,17 most notably by Hen-inger. 18 Other work includes specifications of security properties, 19,20reliability,21 performance,22'23 and communication protocols.24 Theproblem of showing that a specification matches an informal require-ment has also been considered.25

13. W. A. Wulf, R. L. London and M. Shaw, "An Introduction to the Constructionand Verification of Alphard Programs," IEEE Trans. Software Eng., Vol. SE-2,No. 4, Dec. 1976.

14. J. V. Guttag and J. J. Horning, "An Introduction to the Larch Shared Language,"Proc. IFIP Cong., Paris, 1983.

15. J. V. Guttag, "Notes on Type Abstraction (Version 2)," IEEE Trans. SoftwareEng., Vol. SE-6, No. 1, Jan. 1980, pp. 13-23.

16. J. V. Guttag and J. J. Horning, "Formal Specification As a Design Tool," Proc.ACM Symp. Prin. Prog. Lang.,Jan. 1980, pp. 251-261.

17. Proc. Conf. Spec. Reliable Software, 1979.18. K. L. Heninger, "Specifying Software Requirements for Complex Systems:

New Techniques and TheirApplications," Proc. IEEEConf. Spec. Reliable Soft-ware, 1979, pp. 1-14.

19. J. K. Millen, "Security Kernel Validation in Practice," Comm. ACM, Vol. 19, No.5, May 1976.

20. B. J. Walker, R. A. Kemmererand G. J. Popek, "Specification and Verification ofthe UCLA Security Kernel," Comm. ACM, Vol. 23, No. 2, Feb. 1980.

21. J. H. Wensley et al., "SI FT: Design and Analysis of a Fault-tolerant Computer forAircraft Control," Proc. IEEE, Vol. 66, No. 10, Oct. 1978, pp. 1240-1255.

22. L. H. Ramshaw, Formalizing the Analysis of Algorithms, PhD dissertation,Stanford University, 1979.

23. M. Shaw, "A Formal System for Specifying and Verifying Program Perfor-mance," technical report CMU-CS-79-129, Carnegie-Mellon University, June1979.

24. D. l. Good, "Constructing Verified and Reliable Communications ProcessingSystems," ACM Software Eng. Notes, Vol. 2, No. 5, Oct. 1977.

25. A. M. Davis and T. G. Rauscher, "Formal Techniques and Automatic Processingto Ensure Correctness in Requirements Specifications," Proc. IEEE Conf.Spec. Reliable Software, 1979, pp. 15-35.

ological concerns included the need forinformation hiding and locality ofdata access, a systematic view of datastructures, a program organizationstrategy exemplified by the Simulaclass construct, and the notion of ge-neric definition. The formal roots in-cluded a proposal for abstractingproperties from an implementationand a debate on the philosophy oftypes, which finally led to the view thattypes share the formal characteristicsof abstract algebras.

Structured programming involvesprogressive development of a programby adding detail to its control struc-ture. Programming with abstract datatypes, however, involves partitioningthe program in advance into modulesthat correspond to the major datastructures of the final system. The twomethodologies are complementary,because the techniques of structuredprogramming may be used within typedefinition modules, and conversely.

In most languages that provide thefacility, the definition of an abstractdata type consists of a program unitthat includes the following informa-tion.

* Visible outside the type definition:the name of the type and thenames and routine headers of alloperations (procedures and func-tions) that are permitted to use therepresentation of the type; somelanguages also include formalspecifications of the values thatvariables of this type may assume,and of the properties of the opera-tions.

* Not visible outside the type defini-tion: the representation of thetype in terms of built-in data typesor other defined types, the bodiesof the visible routines, and hiddenroutines that may be called onlyfrom within the module.

An example of the externally visibleportion of a module that defines anabstract data type appears in Figure 5on page 23.The general topic of abstract data

types has been addressed in a numberof research projects. These includeAlphard, CLU, Gypsy, Russell, Con-

October 1984 17

Page 9: Abstraction Techniques in Modern Programming Languages

current Pascal, and Modula. Al-though they differ in detail, they sharethe goal of providing language supportadequate to the task of abstractingfrom data structures to abstract datatypes and allowing those abstractdefinitions to hold the same status asbuilt-in data types. Descriptions of thedifferences among these projects arebest obtained by studying them inmore detail than is appropriate here.As with many research projects, the

impact they have is likely to take theform of influence on other languagesrather than complete adoption. Indeed,the influence of several research proj-ects on Ada and Euclid is apparent.Programming with abstract data

types requires support from the pro-gramming language, not simplymanagerial exhortations about pro-gram organization. Suitable languagesupport requires solutions to a numberof technical issues involving both

Programming languages for abstract data typesThe history of programming is marked by many programming

language designs. Some have had major user communities, others havehad limited use, and some have not been implemented. In addition,many of the fundamental insights into programming language designhave appeared in papers that discussed individual language facilitiesrather than full programming languages.The major thrust of programming language development in the 1970's

was the abstract data type. This development produced languages tosupport programming methods based on language support for datatypes,' strategies for defining modules,2 and concerns over the impactof locality on program organization.3 The language facilities draw heavi-ly on the class construct of Simula4 and on the control structures ofPascal.5 The programming language designs and implementations thatexplored abstract data types included Ada,6,7 Alphard,8 CLU,9 Concur-rent Pascal,'0 Euclid,'1 Gypsy,12 Mesa,13 Modula,14 and Russell.15

References1. C. A. R. Hoare, "Notes on Data Structuring," Structured Programming,

Academic Press, 1972, pp. 83-174.2. D. L. Parnas, "On the Criteria to be Used in Decomposing Systems into

Modules," Comm. ACM, Vol. 15, No. 12, Dec. 1972.3. W. A. Wulf and M. Shaw, "Global Variables Considered Harmful," ACM

SIGPLAN Notices, Vol. 8, Feb. 1973.4. 0. J. Dahi and C.A.R. Hoare, "Hierarchical Program Structures," Structured

Programming, Academic Press, 1972, pp. 175-220.5. H. Ledgard, American PASCAL Standard, Springer-Verlag, New York, 1984.6. US DoD, Reference Manual for the Ada Programming Language, Nov. 1980.7. A. N. Habermann and D. E. Perry, Ada for Experienced Programmers,

Addison-Wesley, 1983.8. M. Shaw, ALPHARD: Form and Content, Springer-Verlag, New York, 1981.9. B. Liskov et al., "Abstraction Mechanisms in CLU," Comm. ACM, Vol. 20, No.

8, Aug. 1977.10. P. Brinch Hansen, "The Programming Language Concurrent Pascal," IEEE

Trans. Software Eng., Vol. SE-1, June 1975.11. B. W. Lampson et al., "Report on the Programming Language Euclid," ACM

SIGPLAN Notices, Vol. 12, No. 2, Feb. 1977.12. A. L. Ambler et al., "Gypsy: A Language for Specification and Implementa-

tion of Verifiable Programs, " ACM SIGPLAN Notices, Vol. 12, No. 3, Mar.1977.

13. C. M. Geschke, J. H. Morris, Jr., and E. H. Satterthwaite, "Early Experiencewith Mesa," Comm. ACM, Vol. 20, No. 8, Aug. 1977.

14. N. Wirth, Programming in MODULA-2, Springer-Verlag, New York, 1983.15. A. J. Demers and J. E. Donahue, "Data Types, Parameters and Type Check-

ing," Proc. ACM Symp. Prin. Prog. Lan., Jan. 1980, pp. 12-23.

design and implementation. These in-clude

* Naming. Scope rules are neededto ensure the appropriate visibilityof names. In addition, protectionmechanisms9 10 may be needed toguarantee that hidden informa-tion remains private. Further,programmers must be preventedfrom naming the same data inmore than one way ("aliasing") ifcurrent verification technology isto be relied upon.

* Type checking. It is necessary tocheck actual parameters toroutines, preferably during com-pilation, to be sure they will be ac-ceptable to the routines. The pro-blem is more complex than thetype checking problem for con-ventional languages because newtypes may be added during thecompilation process and theparameterization of types requiressubtle decisions in the definitionof a useful type checking rule.

* Specification notation. The for-mal specifications of an abstractdata type should convey all infor-mation needed by the program-mer. This is not yet possible, butcurrent progress is described be-low. As for any specification for-malism, it is also necessary to de-velop a method for verifying thata specification is consistent withits implementation.

* Distributed properties. In addi-tion to providing operations thatare called as routines or infix op-erators, abstract data types mustoften supply definitions to sup-port type-specific interpretationof various constructs of the pro-gramming language. These con-structs include storage allocation,loops that operate on the elementsof a data structure without knowl-edge of the representation, andsynchronization. Some of thesehave been explored, but manyopen questions remain.7

* Separate compilation. Abstractdata types introduce two newproblems to the process of separ-ate compilation. First, type check-ing should be done across com-

IEEE SOFTWARE18

Page 10: Abstraction Techniques in Modern Programming Languages

pilation units as well as withinunits. Second, generic definitionsoffer significant potential for op-timization (or for inefficient im-plementation).

Specification techniques for ab-stract data types are the topic of anumber of current research projects.Proposed techniques include informalbut precise and stylized English, mod-els that relate the new type to previous-ly defined types, algebraic axioms thatspecify new types independently ofother types, and hybrids of these.Many problems remain. The emphasisto date has been on the specification ofproperties of the code; the correspon-dence of these specifications to infor-mally understood requirements is alsoimportant. Further, the work to datehas concentrated almost exclusively onthe functional properties of the pro-gram without attending, for example,to its performance or reliability.

Not all the language developmentsinclude formal specifications as part ofthe code. For example, Alphard in-cludes language constructs that asso-ciate a specification with the im-plementation of a module; Ada andMesa expect interface definitions thatcontain at least enough information tosupport separate compilation. All ofthis language work, however, is basedon the premise that the specificationmust include all information availableto a user of the abstract data type.When it has been verified that the im-plementation performs in accordancewith its public specification, theabstract specification may safely beused as the definitive source of infor-mation about how higher level pro-grams may correctly use the module.In one sense we build up "bigger" def-initions out of "smaller" ones; but be-cause a specification alone suffices forunderstanding, the new definition is inanother sense no bigger than the pre-existing components. It is this regi-mentation of detail that gives the tech-nique its power.

Generic definitions. A particularlyrich kind of abstract data type defini-tion allows one abstraction to take

another abstraction, for example, adata type, as a parameter. Thesegeneric definitions provide a dimen-sion of modeling flexibility that con-ventionally parameterized definitionslack.

For example, consider the problemof defining data types for an applica-tion that uses three kinds of unorderedsets: sets of integers, sets of reals, andsets of a user-defined type for pointsin three-dimensional space. One alter-native would be to write a separate def-inition for each of these three types.However, that would involve a greatdeal of duplicated text, since both thespecifications and the code will be verysimilar for all the definitions. In fact,the programs would probably differonly where specific references to thetypes of set elements are made, and themachine code would probably differonly where operations on set elements(such as the assignment used to store anew value into the data structure) areperformed. The obvious drawbacks ofthis situation include duplicated code,redundant programming effort, andcomplicated maintenance (since bugsmust be fixed and improvements mustbe made in all versions).Another alternative would be to

separate the properties of unorderedsets from the properties of their ele-ments. This is possible because thedefinition ofthe set types relies on veryfew specific properties of the elements-it probably assumes only that ordi-nary assignment and equality -opera-tions for the element type are defined.Under that assumption, it is possible towrite a single definition, say

type UnOrderedSet( T: type) is ....

that can be used to declare sets withseveral different types of elements, asin

varCounters: UnOrderedSet (integer);Timers: UnOrderedSet (integer);Sizes: UnOrderedSet (real);Places: UnOrderedSet (PtIn3Space);

using a syntax appropriate to the lan-guage that supports the generic defini-

tion facility. The definition of Un-OrderedSet would provide operationssuch as Insert, TestMembership, andso on; the declarations of the variableswould instantiate versions of these op-erations for all relevant element types,and the compiler would determinewhich of the operations to use at anyparticular time by inspecting theparameters to the routines.

The flexibility provided by genericdefinitions is great enough to support adefinition5 that automatically con-verts any solution of one class of prob-lems to a solution ofthe correspondingproblem in a somewhat larger class.This generic definition is notable forthe detail and precision with which theassumptions about the generic param-eter can be specified.

Practical realizationsA number of programming lan-

guages provide some or all of the facili-ties required to support abstract datatypes. In addition to implementationsof research projects, several languageefforts have been directed primarily atproviding practical implementations.These include Ada, Mesa, Pascal, andSimula. Of these, Pascal currently hasthe largest user community, and theobjective of the Ada development hasbeen to make available a language tosupport most of the modern ideasabout programming. Because of themajor roles they play in the program-ming language community, Pascal andAda will be discussed in some detail.

The evolution of programming lan-guages through the introduction ofabstraction techniques will be illus-trated with a small program. The pro-gram is presented in Fortran IV to il-lustrate the state of our understandingin the late 1960's. Revised versions ofthe program in Pascal and Ada showhow abstraction techniques haveevolved.

Small example program. Our exam-ple program produces the data neededto print an internal telephone list for adivision of a small company. A data-base containing information about allemployees, including their names, di-

October 1984 19

Page 11: Abstraction Techniques in Modern Programming Languages

EmpNam

Vectors that contain Employee informationName is in EmpNam (24 chars), Phone is in EmpFon (integer)Salary is in EmpSal (real), Division is in EmpDiv (4 chars)integer EmpFon(lD00), EmpDiv(1000)real EmpSal(1000)double precision EmpNam(3, 1000) - ARRAY: 8 CHARS/WORD

Vectors that contain Phone list informationName is in DivNam (24 chars), Phone is in DivFon (integer)integer DivFon(1000)double precision DivNam(3, 1000)

declarations of scalars used in programinteger StafSz, DivSz, i, jinteger WhichDdouble precision q

INFORMATION ABOUT ONEr_1 r_ n /EMPLOYEE

L m 7-? 5n ZINFORMATION~~~BOUT ONE

I.:F7 TELEPHONE* , . . .. . . ~~~LISTING

y UEmpSal EmpDiv u

EmpEon DivNam DivFon

Figure 1 (above). Dechwrtions for Fortran version of telephone list pro-

gram.

c Get data for division WhichD only

DivSz = 0do 200 i= l, StafSz

if (EmpDiv(i) .ne. WhichD) go to 200DivSz = DivSz + 1DivNam(l, DivSz) = EmpNam(1,i)DivNam(2, DivSz) = EmpNam(2, i)DivNam(3, DivSz) = EmpNam(3,i)DivFon(DivSz) = EmpFon(i)

200 continue

c Sort telephone list

if (DivSz eq. 0) go to 210do 220 i = 1, DivSz

do 230 j = 1 + 1, DivSzif (DivNam(1i) .gt. DivNam(1j)) go to 240if (DivNam(l,i) It. DivNam(l,j)) go to 230if (DivNam(2,i) .gt. DivNam(2,j)) go to 240if (DivNam(2,i) It. DivNam(2,j)) go to 230if (DivNam(3,i) .gt. DivNam(3,j)) go to 240go to 230

240 do 250 k= 1,3q = DivNam (k,i)DivNam(k, i) = DivNam(k, j)

250 DivNam(k, j) = qk = DivFon(i)DivFon(l) = DivFon(j)DivFon(j) k

230 continue220 continue210 continue

Fgue 2 (right). Code for Fortran version of telephone list program.

visions, telephone numbers, and sal-aries is assumed to be available. Theprogram must produce a data struc-ture containing a sorted list of theemployees in a selected division andtheir telephone extensions. Suitabledeclarations of the employee databaseand the divisional telephone list for theFortran implementation are given inFigure la. Figure lb shows how thelarge data structures are thought of as

parallel vectors. A program fragmentfor constructing the telephone list isgiven in Figure 2.The employee database is repre-

sented as a set of vectors, one for eachunit of information about the em-

ployee. The vectors are used "in par-

allel" as a single data structure-thatis, part of the information about theith employee is stored in the ith ele-ment of each vector. Similarly, thetelephone list is constructed in two ar-

rays, DivNam for names and DivFonfor telephone numbers.The telephone list is constructed in

two stages. First, the database isscanned for employees whose division(EmpDiv(i)) matches the division

desired ( WhichD). When a match isfound, the name and phone number ofthe employee are added to the tele-phone list. Second, the telephone list issorted using an insertion sort. (This se-lection is not an endorsement of inser-tion sorting in general. However, mostreaders will recognize the algorithm,and the topic of this article is the evolu-tion of programming languages, notsorting techniques.)

There are several important thingsto notice about this program. First, thedata about employees is stored in fourarrays, and the relation among thesearrays is shown only by the similarnaming and the comment with theirdeclarations. Second, the characterstring for each employee's name mustbe handled in eight-character seg-ments, and there is no clear indicationin either the declarations or the codethat character strings are involved. (In-deed, the implementations of floatingpoint in some versions of Fortraninterfere with this type violation.Character strings are dealt with more

appropriately in the Fortran 77 stand-ard.) The six-line test that checks for

DivNam(*, i) < DivNam(*, j) couldbe reduced to three tests if it were

changed to a test for less-than-or-equal, but this would make the sortunstable. Third, all the data aboutemployees, including salaries, is easilyaccessible and modifiable; this isundesirable from an administrativestandpoint.

Pascal. Pascal is a simple algebraiclanguage that was designed with threeprimary objectives: support modernprogramming development methodol-ogy; be simple enough to teach to stu-dents; and be easy to implement reli-ably, even on small computers. It has,in general, succeeded in all threerespects.

Pascal provides a number of facil-ities for supporting structured pro-gramming. It provides the standardcontrol constructs of structured pro-

gramming, and a formal definitionfacilitates verification of Pascal pro-grams. It also supports a set of dataorganization constructs that are suit-able for defining abstractions. Theseinclude the ability to define a list of ar-

IEEE SOFTWARE

(a) -

ccc

cc

c

(b)I *.

r \ vZ7 77

20

Page 12: Abstraction Techniques in Modern Programming Languages

String = packed array [1.. 24] of char;ShortString = packed array [1.. 8] of char;'EmpRec = record

Name: StringPhone: integer;Salary: real;Division: ShortString;end;

PhoneRec=record Name: String; Phone: integer;

[Staff: array [I..1000] of EmpRec;Phones: array [1.1000] of PhoneRec;]StaffSize, DivSize, i, j: integer;WhichDiv: char;q: PhoneRec;

Figure 3 (left). Declarations for Pascal version of telephonelist program.

Figure 4 (below). Code for Pascal version of telephone listprogram.

{Get data for division WhichDiv only}

DivSize: = 0;for i: = 1 to StaffSize do

if Staff Ni]. Division= WhichDiv thenbeginDivSize:= DivSize + 1;Phones[DivSize]. Name: = Staff[i]. Name;Phones[DivSize]. Phone: = Staff[i]. Phone;end;

{Sort telephone list}

for i: = 1 to DivSize - 1 dofor j: =i+ I to DivSize do

if Phones [i]. Name> Phones[j]. Name thenbeginq:= Phones[il;Phonesi]: = Phones[]];Phones[j]:= q;end;

bitrary constants as an enumeratedtype, the ability to define heterogen-eous records with individually namedfields, data types that can be dynam-ically allocated and referred to bypointers, and the ability to name a datastructure as a type (though not to bun-dle up the data structure with a set ofoperations).The language has become quite

widely used. In addition to serving as a

teaching language for undergraduates,it is used as an implementation lan-guage for micro-computers, and ithas been extended to deal with parallelprogramming. An international stan-dard has been established.

Pascal is not without its disadvan-tages. It provides limited support forlarge programs, for it lacks separatecompilation facilities and block struc-ture other than nested procedures.Type checking does not provide quiteas much control over parameter pass-

ing as we might wish, and there is nosupport for the encapsulation of re-

lated definitions in such a way thatthey can be isolated from the re-

mainder of the program. Many of thedisadvantages are addressed in exten-sions, derivative languages, and thestandardization effort.We can illustrate some of Pascal's

characteristics by returning to the pro-

gram for creating telephone lists. Suit-able data structures, including bothtype definitions and data declarations,are shown in Figure 3a. Figure 3bshows the view of the data structuressuggested by these declarations, thatis, vectors of records rather than in-dependent vectors. A program frag-ment for constructing the telephonelist is given in Figure 4.

The declarations open with defini-tions of four types that are not pre-

defined in Pascal. Two, String andShortString, are generally useful, andthe other two, EmpRec and Phone-Rec, were designed for this particularproblem.

The definition of String and Short-String as types permits named vari-ables to be treated as single units;operations are performed on an entirestring variable, not on individualgroups of characters. This abstractionsimplifies the program, but more im-portantly, it allows the programmer toconcentrate on the algorithm that uses

the strings as names, rather than on

keeping track of the individual name

fragments. The difference between thecomplexity of the code in Figures 2 and4 may not seem large, but when it iscompounded over many individualcomposite structures with different

representations, the difference can belarge indeed. If Pascal allowed pro-grammer-defined types to acceptparameters, a single definition ofstrings that took the string length as a

parameter could replace String andShortString.The type definitions for EmpRec

and PhoneRec abstract from specificdata items to the notions "record of in-formation about an employee" and"record of information for a tele-phone list." Both the employee data-base and the telephone list can thus berepresented as vectors whose elementsare records of the appropriate types.The declarations of Staff and

Phones have the effect of indicatingthat all the components are related tothe same information structure. In ad-dition, the definition is organized as a

collection of records, one for eachemployee, so the primary organizationof the data structure is by employee.On the other hand, the data organiza-tion of the Fortran program was dom-inated by the arrays that correspondedto the fields, and the employees were

secondary.Just as in the Fortran program, the

telephone list is constructed in twostages (Figure 4). Note that Pascal'sability to operate on strings and rec-ords as single units has substantially

October 1984

(a) type

21

Page 13: Abstraction Techniques in Modern Programming Languages

simplified the manipulation of namesand the interchange step of the sort.Another notable difference betweenthe two programs is in the use of condi-tional statements. In the Pascal pro-gram, the use of if. . then statementsemphasizes the conditions that willcause the bodies of the if statements tobe executed. The Fortran if statementswith go to's however, describe condi-tions in which code is not to be ex-ecuted, leaving the reader of the pro-gram to compute the conditions thatactually correspond to the actions.

It is also worth mentioning that thePascal program will not execute thebody of the sort loop at all if no em-ployees work in division WhichDiv(that is, if DivSize is 0.) The body ofthe corresponding Fortran loop wouldbe executed once in that situationunless the loop had been protected byan explicit test for an empty list. Whileit would do no harm to execute thisparticular loop once on an empty list,in general it is necessary to guard For-tran loops against the possibility thatthe upper bound is less than the lowerbound.

Ada. The Ada language has beendeveloped under the auspices of theDepartment of Defense in an attemptto reduce the software costs of em-bedded computer systems. The projectincludes components for both a lan-guage and a programming support en-vironment. The specific objectives ofthe Ada development include signifi-cantly reducing the number of pro-gramming languages that must belearned, supported, and maintainedwithin the Department of Defense.The language design emphasizes thegoals of high program reliability, lowmaintenance costs, support for mod-ern programming methodology, andefficiency of compilers and object pro-grams.The Ada language was developed

through competitive designs con-strained by a set of requirements. Re-visions to the language were completedin the summer of 1980 and the initiallanguage reference manual was pub-lished in November 1980. Revisions tothe manual were made in 1981 and

1982, and standardization is well un-derway: Ada has achieved MIL-STDand ANSI-STS status (1815A-1983).Development of the programming en-vironment will continue over the nextseveral years. 12 Since only a fewvalidated compilers for the languageare now available, it is too soon toevaluate how well the language meetsits goals. However, it is possible to de-scribe the way various features of thelanguage respond to the abstractionissues raised here.

Ada programs can provideselected access to private

information.

Although Ada grew out of the Pas-cal language philosophy, extensivesyntactic changes and semantic exten-sion make it a very different languagefrom Pascal. The major additions in-clude module structures and interfacespecifications for large-program or-ganizations and separate compilation,encapsulation facilities and generic de-finitions to support abstract datatypes, support for parallel processing,and control over low-level implement-ation issues related to the architectureof object machines.

There are three major abstractiontools in Ada. The package is used forencapsulating a set of related defini-tions and isolating them from the restof the program. The type determinesthe values a variable (or data structure)may take on and how it can be manip-ulated. The generic definition allowsmany similar abstractions to be gener-ated from a single template, as we de-scribed earlier.The incorporation of many of these

ideas into Ada can be illustratedthrough the example we used in dis-cussing Pascal. The data organizationof the Pascal program (Figures 3 and 4)could be carried over directly to the Adaprogram, and the result would use Adareasonably well. However, Ada pro-vides additional facilities that can be ap-plied to this problem. Recall thatneither the Fortran program nor the

Pascal program can allow a program-mer to access names, telephone num-bers, and divisions without also allow-ing him to access private information,here illustrated by salaries. Ada pro-grams can provide such selected access,and we will now extend the previous ex-ample to do so.We organize the program in three

components: a definition of the recordfor each employee (Figure 5), declara-tions of the data needed by the pro-gram (Figure 6), and code for con-struction of the phone list (Figure 7).The package of information about

employees whose specification isshown in Figure 5a illustrates one ofAda's major additions to our tool kitof abstraction facilities. This defini-tion establishes EmpRec as a data typewith a small set of privileged opera-tions. Only the specification of thepackage is presented here, and Figure5b suggests the view the user of thispackage should have, with some infor-mation hidden and only pertinent dataand operations visible. Ada does notrequire the package body to accom-pany the specification (though it mustbe defined before the program can beexecuted); moreover, programmersare permitted to rely only on thespecifications, not on the body of apackage. The specification itself isdivided into a visible part (everythingfrom package to private) and a privatepart (from private to end). The privatepart is intended only to provide infor-mation for separate compilation.Assume that the policy for using

EmpRec's is that the Name and Phonefields are accessible to anyone, that it ispermissible for anyone to read but notto write the Division field, and that ac-cess to the Salary field and modifi-cation of the Division field are sup-posed to be done only by authorizedprograms. Two characteristics of Adamake it possible to establish this pol-icy. First, the scope rules prevent anyportion of the program outside a pack-age from accessing any names exceptthe ones listed in the visible part of thespecification. In the particular case ofthe Employee package, this means thatthe Salary and Division fields of an

IEEE SOFTWARE22

Page 14: Abstraction Techniques in Modern Programming Languages

(a) package Employee istype PrivStuff is limited private;subtype ShortString is String(1..8);type EmpRec is

recordName: string(l..24);Phone: integer;PrivPart: PrivStuff;

end record;procedure SetSalary(Who: in out EmpRec; Sal: float);function GetSalary(Who: EmpRec) return float;procedure SetDiv(Who: in out EmpRec: Div: ShortString;function GetDiv(Who: EmpRec) return ShortString;

private

end Employee;

(a) declareuse Employee;

type PhoneRec isrecord

Name: string(1..24);Phone: integer;

end record;

Staff: array (1..1000) of EmpRec;Phones: array (1..1000) of PhoneRecj.StaffSize, DivSize: integer range 1..1000;WhichDiv: ShortString;q: PhoneRec; /

INFORMATIONABOUT ONEEMPLOYEE

(b)

GET SALARY

SET SALARY

GET DIVISION

SET DIVISION

-Get data for division WhichDiv only

DivSize: = O;for i in 1..StaffSize loop

if GetDiv(Staff(i)) = WhichDiv thenDivSize:= DivSize + 1;Phones(DivSize): = (Staff(i). Name, Staff(i). Phone);

end if;end loop;

-Sort telephone list

Figure 5 (above). Ada package definition for employee records.

Figure 6 (above right). Declarations for Ada version of telephonelist program.

Figure 7 (right). Code for Ada version of telephone list program.

for i in I.. DivSize -I loopfor j in i+ 1.. DivSize loop

if Phones(i). Name> Phones(j). Name thenq:= Phones(i);Phones(i): = Phones(j);Phones(j):= q;

end if;end loop;

end loop;

EmpRec cannot be directly read orwritten outside the package. Thereforethe integrity of the data can be con-trolled by verifying that the routinesthat are exported from the package arecorrect. Presumably the routines Set-Salary, Get Salary, SetDiv, and Get-Div perform reads and writes as theirnames suggest; they might also keeprecords -showing who made changesand when. Second, Ada provides waysto control the visibility of each routineand variable name.

Although the field name PrivPart isexported from the Employee packagealong with Name and Phone, there isno danger in doing so. An auxiliary

type was defined to protect the salaryand division information; the declara-tion

type PrivStuff is limited private;

indicates not only that the content andorganization of the data structure arehidden from the user (private), but al-so that all operations on data of typePriuStuff are forbidden except forcalls on the routines exported from thepackage. For limited private types,even assignment and comparison forequality are forbidden. Naturally, thecode inside the body of the Employeepackage may manipulate these hidden

fields; the purpose of the packaging isto guarantee that only the code insidethe package body can do so.The ability to force a data structure

to be manipulated only through aknown set of routines is central to thesupport of abstract data types. It isuseful not only in examples such as theone given here, but also for cases inwhich the representation may changeradically from time to time, and forcases in which some kind of internalconsistency among fields, such aschecksums, must be maintained. Sup-port for secure computation is notamong Ada's goals. It can be achievedin this case, but only through a com-

October 1984 23

Page 15: Abstraction Techniques in Modern Programming Languages

bination of an extra level of packaging periences. Our sharpened perceptionsand some management control per-formed in the subprograms. Evenwithout guarantees about security,however, the packaging of informa-tion about handling employee dataprovides a useful structure for pro-gram development and maintenance.The declarations of Figure 6a are

much like the declarations of thePascal program. The Employee pack-age is used instead of a simple record,and there are minor syntactic dif-ferences between the languages. Theclause

use Employee;

says that all the visible names of theEmployee package are available in thecurrent block. Figure 6b shows that theview of the data suggested by Ada issimilar to that of Figure 3b; the chiefdifference here is the lack of internalstructure in Employee values.

In the code of the Ada program it-self (Figure 7), visibility rules allow thenon-private field names of EmpRecsand the GetDiv function to be used.Ada provides a way to create a com-plete record value and assign it with asingle statement; thus the assignment

Phones(DivSize): = (Staff(i) .Name,Staff(i). Phone);

sets both fields of the PhoneRec atonce. Aside from this and minor syn-tactic distinctions, this program frag-ment is very much like the Pascal frag-ment of Figure 4.

Status and potentialIt is clear that methodologies and

analytic techniques based on the prin-ciple of abstraction have played a ma-jor role in the development of softwareengineering and that they will continueto do so.Programming languages and meth-

odologies often develop in response tonew ideas about how to cope withcomplexity in programs and systemsof programs. As languages evolve tomeet these ideas, we reshape ourperceptions of the problems and solu-tions in response to the new ex-

in turn generate new ideas which feedthe evolutionary cycle. This article ex-plores the routes by which these cyclicadvances in methodology and specifi-cation have led to current concepts andprinciples of programming languages.We can now describe the ways our

current programming habits arechanging to take advantage of theseprinciples of abstraction. We also notesome of the limitations of current tech-niques and how future work may dealwith them, and we conclude with somesuggestions for further reading on ab-straction techniques.

Effect on programming. As tech-niques such as abstract data types haveemerged, they have affected both theoverall organization of programs andthe style of writing small segments ofcode.The new languages have the most

sweeping effects on the techniques weuse for the high-level organization ofprogram systems, and hence on themanagement of design and implemen-tation projects. Modularization fea-tures that impose controls on the dis-tribution of variable, routine, and typenames can profoundly shape the strat-egies for decomposing a program intomodules. Project organization willalso be influenced by the growingavailability of support tools for man-aging multiple modules in multipleversions.The availability of precise (and en-

forceable) specifications for moduleinterfaces will influence managementof software projects.6 For example,the requirements document for a largeavionics system has already been con-verted to a precise, if informal, spec-ification. The usefulness of a formalspecification is greatest when the spe-cification can be processed or checkedautomatically. Automatic verificationof logical assertions attached to pro-grams is not imminent, but systemsthat do partial runtime checking ofassertions (given, for example, as spe-cially-formatted comments) are al-ready feasible.The organization and style of the

code within modules will also be af-

fected. We showed earlier in our dis-cussion of practical realizations howthe treatment of both control and datawithin a module changes as the sameproblem is solved in languages with in-creasingly powerful abstraction tech-niques.The ideas behind the abstract data

type methodology are still not entirelyvalidated. Projects using various por-tions of the methodology-such as de-sign based on data types, but withoutformal specification, or conversely,specification and verification withoutmodularity-have been successful, buta complete demonstration on a largeproject has not yet been completed.Although complete validation experi-ments have not been done, some of theinitial trials are encouraging. A large,interesting program using abstractdata types has been written but notverified; programs using this organ-ization in a language without en-capsulation facilities have been writtenand largely verified; and abstract datatypes specified via algebraic axiomshave proved useful as a design tool.

Limitations of current techniques.Efforts to use abstract data types havealso revealed some limitations. Insome cases problems are not comfor-tably cast as data types, or thenecessary functionality is not readilyexpressed using the, specificationtechniques now available. In othercases, the problem requires a set ofdefinitions that are clearly very similarbut cannot be expressed by systematicinstantiation or invocation of a datatype definition, even using genericdefinitions.A number of familiar, well-struc-

tured program organizations do not fitwell into precisely the abstract datatype paradigm. These include, for ex-ample, filters and shells in the Unixspirit, 13 object-oriented systems, 14production systems, table-driven in-terpreters, state-transition systems,and interactive programs in which thecommand syntax dominates the spec-ification. These organizations are un-questionably useful and potentiallyas well understood as abstract datatypes, and there is every reason to

IEEE SOFTWARE24

Page 16: Abstraction Techniques in Modern Programming Languages

believe that similarly precise formalmodels can be developed.

Although facilities for defining rou-tines and modules whose parametersmay be generic (that is, of types thatcannot be manipulated in the lan-guage) have been developed over thepast several years, there has been littleexploration ofthe generality of genericdefinitions. Part of the problem hasbeen lack of facilities for specifying theprecise dependence of the definitionon its generic parameters. A specificexample of a complex generic defini-tion, giving an algorithmic transfor-mation that can be applied to a widevariety of problems, has been writtenand verified. 5

The language investigations de-scribed above, together with other re-search projects, have addressed ques-tions of functional specification inconsiderable detail. That is, they pro-vide formal notations such as input-output predicates, abstract models,and algebraic axioms for making as-sertions about the effects that opera-tors have on program values. In manycases, the specifications of a systemcannot be reduced to formal asser-tions; in these cases we resort to testingin order to increase our confidence inthe program.15 In other situations,moreover, a programmer is concernedwith properties other than pure func-tional correctness. Such properties in-clude time and space requirements,memory access patterns, reliability,synchronization, and process inde-pendence. These have not been ad-dressed by the data type research.A specification methodology that

addresses these properties must havetwo important characteristics. First, itmust be possible for the programmerto make and verify assertions aboutthe properties rather than simplyanalyzing the program text to deriveexact values or complete specifica-tions. This is analogous to our ap-proach to functional specifications-we don't attempt to formally derivethe mathematical function defined bya program; rather, we specify certainproperties of the computation that areimportant and must be preserved. Fur-ther, it is important that the specifica-

tion methodology avoid adding a newconceptual framework for each newclass of properties. This implies thatmechanisms for dealing with new pro-perties should be compatible with themechanisms already used for func-tional correctness. n

AcknowledgmentsThis research was sponsored by the Na-

tional Science Foundation under GrantMCS77-03883 and by the Defense Ad-vanced Research Project Agency, ARPAOrder No. 3597, monitored by the AirForce Avionics Laboratory under contractF33615-78-C-1551. The views and conclu-sions contained in this document are thoseof the author and should not be interpretedas representing the official policies, eitherexpressed or implied, of the NationalScience Foundation, DARPA, or the USgovernment.

References1. D. E. Knuth "Fundamental Algo-

rithms," The Art of Computer Pro-gramming, 2nd ed., Vol. 1, Addison-Wesley, 1973.

2. P. Naur and B. Randell (eds.), "Soft-ware Engineering," NATO, 1969,report on conference sponsored by theNATO Science Committee, Gar-misch, Germany.

3. J. N. Buxton and B. Randell (eds.),"Software Engineering Techniques,"NATO, 1970, report on conferencesponsored by the NATO ScienceCommittee, Rome, Italy.

4. S. A. Schuman (ed.), "Proceeding ofthe International Symposium on Ex-tensible Languages," ACMSIGPLANNotices, Vol. 6, Dec. 1971.

5. J. L. Bentley and M. Shaw, "AnAlphard Specification of a Correctand Efficient Transformation on DataStructures," Proc. IEEE Conf. Spec.Reliable Software, Apr. 1979, pp.222-237.

6. R. T. Yeh and P. Zave, "SpecifyingSoftware Requirements," Proc.IEEE, Vol. 68, No. 9, Sept. 1980.

7. M. Shaw and W. A. Wulf, "TowardRelaxing Assumptions in Languagesand Their Implementations," SIG-PLAN Notices, Vol. 13, No. 3, Mar.1980, pp. 45-61.

8. F. DeRemer and H. H. Kron,"Programming-in-the-Large vs. Pro-

gramming-in-the-Small," IEEETrans. Software Eng., Vol. SE-2, No.2, June 1976.

9. A. K. Jones and B. H. Liskov, "AnAccess Control Facility for Program-ming Languages," MIT memo 137,MIT Computation Structures Groupand Carnegie-Mellon University,1976.

10. J. H. Morris, "Protection in Pro-gramming Languages," Comm.ACM, Vol. 16, Jan. 1973.

11. K. L. Bowles, Microcomputer Prob-lem Solving Using Pascal, Springer-Verlag, 1977.

12. Department of Defense, Require-mentsforAda Programming SupportEnvironments: Stoneman, 1980.

13. B. W. Kernighan and P. J. Plauger,Software Tools, Addison-Wesley,1976.

14. B. J. Cox, "Message/Object Pro-gramming: An Evolutionary Changein Programming Technology," IEEESoftware, Vol. 1, No. 1, Jan. 1984,pp. 50-61.

15. J. B. Goodenough and C. L.McGowan, "Software Quality Assur-ance: Testing and Validation," Proc.IEEE, Vol. 68, No. 9, Sept. 1980.

Mary Shaw is currently an associateprofessor of computer science at Car-negie-Mellon, where she has been onthe faculty since 1971. Her primaryresearch interests are in programmingsystems and software engineering, par-ticularly abstraction techniques andlanguage tools for developing andevaluating software. She received a BAin math from Rice University in 1965and a PhD in computer science fromCarnegie-Mellon University in 1972.Shaw is a senior member of the IEEE,and a member of the ACM, IEEE-CS,and the New York Academy of Sci-ences. Her address is Computer Sci-ence Department, Carnegie-MellonUniversity, Schenley Park, Pitts-burgh, PA 15213.

IEEE SOFTWARE26