editor's corner some thoughts on automatic code generation

Editor’s Corner Some Thoughts on Automatic Code Generation

Robert L. Glass

More and more often I hear well-known IS people say that eventually, automatic generation of code is going to do away with the tasks of the systems developer. I have read it in the papers and textbooks they write; I hear it in the hallways, and even some- times from the podiums at leading conferences. Of- ten it is used as a justification for shifting the focus of the field of IS from the technology of systems development to the management of information technology.

I would like to say two things about that belief.

1. All of the constituencies in the IS field would welcome such a phenomenon. Certainly the CEO and CIO would be pleased. But so would the programmer/analyst. I’ll explain why I believe that in a few moments.

2. The phenomenon is extremely unlikely to happen.

Let me begin by explaining why I think this notion of large-scale automatic generation of code is unlikely to happen, and then I’ll get back to why programmers would be pleased if it did. First of all, as a software practitioner of long standing, it is clear to me that we do not yet know, and I believe are unlikely ever to know, enough to allow the automatic generation of code to happen. Consider what such a generator would need to know in order to do its job. Let us assume that we are talking about the automatic generation of code from user-provided specifications (a textbook I examined last night stated that as its definition of the concept). The generator would have to know the following.

1. Enough about the application domain to translate the problem specification into high-level design.

2. Enough about the application and implementation domains to translate that high-level design into detailed design.

3. Enough further about the implementation domain to translate detailed design into code.

J. SYSTEMS SOFTWARE 1997; 37:1-3 0 1997 by Elsevier Science Inc. 655 Avenue of the Americas, New York, NY 10010

Let us consider each of these tasks in turn. Imag- ine, if you will, that some sort of expertise is needed to perform each of these tasks (I do not want to say “expert system,” because there is no need to steer implementation decisions in this discussion). The expertise for task 1 is that of the domain specialist, the end-user of the eventual software. That domain specialist could be skilled in an enormous variety of disciplines-tax accounting, nuclear physics, sales and marketing, or flight dynamics, to name just a few. Do we really imagine that one (or a few) generators could be built with sufficient skills to handle that diversity of applications? Do we really imagine that, even for a single complex application domain, we can reduce the necessary knowledge to the mechanistic form needed by a generator?

The expertise for task 2 is that of the designer. Most would agree that design demands more creativity than any other software task. Do we really imagine that a generator will have sufficient creative capabilities to design a solution to a complex problem?

The expertise for task 3 is that of the coder. Certainly this level is where the greatest promise for automated generation of code might lie. And yet even here, there is complexity to the task. Often designs are completed at a level different from the level of the expertise of the coder who is to complete them. Closing the gap between the provided design and the necessary design is still a task requir- ing some creativity. And, for problems and programs of any magnitude, writing that code is still a significant and complicated effort. It may indeed be possi- ble to automate task 3, and in fact, it has been done in certain cases. But for all cases? That eventuality still seems at a considerable intellectual distance from where we presently are.

Three very different sets of expertise, each non- trivial-that’s the requirement for an automated code generator. Does anyone who understands software really believe that research into automatic gen-

0X4-1212/97/$17.00 PII SOW-1212(%)00048-9

2 J. SYSTEMS SOFlWARE 1997: 311-3

Editor’s Corner

eration of code is going to overcome those obsta- cles?

It’s not just my voice crying in this wilderness. Top computer scientist David Pamas said the same thing in his famous “Star Wars” paper a few years back. Top software engineer Fred Brooks said the same thing in his classic “No Silver Bullet” paper. Charles Rich and Richard Waters, who have actively done research in the field of automated generation of code, said the same thing in their “Cocktail Party Myth” paper. People who really understand the technology of software development are extremely dubious that this sort of thing is going to happen.

Let me tell you a story out of my past that bears on this subject. I think it serves as a warning about the claims of those researchers who DO say they’ve achieved automatic generation of code.

Nearly 25 years ago now, when I was working as a software practitioner at the aerospace company Aerojet-General, we heard persistent rumors that another company had found a way to automatically generate COBOL code (COBOL was a very new language at the time!) from design documentation. That was worth checking out. Because Aerojet was quite a forward-looking and even innovative company, the management of my systems programming group authorized us to go to that company to see what was happening.

Here is what we discovered: the programmers wrote their designs as flow charts in the form of COBOL statements with boxes around them and arrows connecting them. Then the automated code generator simply removed the boxes and arrows, and voila! COBOL code.

When I look at the computer science literature on automatic generation of code, still perplexed at how these brilliant researchers have managed to overcome the three complex tasks I’ve identified above, all too often I see the same sleight of hand. Instead of automatically generating code from user-level specifications, which is what the paper tends to imply, all too often the paper shows a form of specification that is awfully close to detailed design as the source material from which code is automatically generated. For example, the problem of choice for automatic code generation is often some sort of stack or queue. Well, stacks and queues do not usually lie in the domain of the problem; they are almost always an artifact of the solution. Thus, the stack/queue specification is usually very much at the level of the code to be generated-just as the COBOL design, above, was actually COBOL code in (mild) disguise.

Now, with this in mind, what DOES lie in the future of automatic code generation? I believe that we can see the answer to that question forming already. There ARE successful automated code generators in 1997, of course. But if you look at them carefully, they are always for relatively simple problems in very well-understood application domains, most often for the domain I characterize as “Report Generation from a Data Base.” What has happened is that we understand tasks 1, 2, and 3 (above) so well for this domain that we CAN construct code automatically from specifications.

Now let’s extrapolate that. The history of automatic generation of code will slowly extend into those domains that have become so well understood that all three steps CAN be automated. I wouldn’t begin to predict what those domains will be. But I suspect it will happen in enterprises where variations of the same problem are solved over and over again. Over the years, for example, companies that build compilers (and the researchers who understand the compiler-writing domain) have found that they can automate certain parts of the compiler-writing job. Interestingly-and with implications for any fore- casting here-there are still portions of the compiler-writing task that have never been automated. But other portions have been automated ,for 30 years.

There is one other thing I would add to the extrapolation in the preceding paragraph. No matter what the domain, it is unlikely that complete automatic generation of code will move past small and relatively simple applications. There is too much complexity in a significant program, even at the detailed coding level, for an automatic generator to have the wisdom it would need to perform the necessary tasks. That brings me back to the issue that I put off at the very beginning of this article. Remember that I said even programmers would be pleased if automatic generation of code became feasible. The reason is this: automatic generation of code would take responsibility for small/simple problems in well-understood domains. Now consider the psychological makeup of most programmers- they are driven by challenge and change and don’t like repetitive tasks. That kind of programmer would be delighted to use a tool that took care of the mundane part of his or her job, leaving the programmer free to concentrate on more complex and inter- esting problems.

In other words, the enemy of automated generation of code is not the stubborn, backward programmer, as some seem to think. The enemy of auto-

Editor’s Corner J. SYSTEMS SOFTWARE 3 1997; 371-3

mated generation of code is and will remain the complexity of the technical task. And no amount of papers and textbooks and hallway conversations and podium presentations to the contrary will change that fundamental fact.

This editorial was originally published in the col- umn “Through a Glass, Darkly,” by Robert L. Glass, in Data Base, the publication of ACM’s SIGMIS, and is used with permission.

editor's corner some thoughts on automatic code generation

Documents