[ieee 2011 18th working conference on reverse engineering (wcre) - limerick, ireland...

Renovation by Machine–Assisted ProgramTransformation in Production Reporting and

Integration

Sava Mintchev

Baring Asset Management, 155 Bishopsgate, London EC2M 3XY, UK

[email protected]

Abstract—In corporate IT, subject areas like Production Re-porting and Enterprise Application Integration are routinelyconsidered in isolation. Needs are often met by purchasingseparate product suites or packages, which can be incompatible,and contain unused overlapping functionality. In this paper wediscuss our experience of applying a more holistic approach. Welook at how purchased software can be extended in–house withthe help of a program transformation technique, and can thenbe utilised in a service–oriented architecture for the purposes ofinformation retrieval, data and process integration. By reusingsoftware components for reporting and integration purposes, wehave been able to realise savings in all phases of the softwarelifecycle.

Keywords-Program transformation; SQR; production report-ing; enterprise application integration; service architecture

I. INTRODUCTION

Most organisations rely on commercial off–the–shelf soft-

ware to support their business. A “buy not build” approach

makes perfect sense for commodity products which have

thousands or millions of users, and have alternatives provided

by a number of competing vendors. Such products are well

tested and supported, and have a clear development roadmap.

Problems usually arise when buying highly specialised

products with few users and other clients. While nominally

off–the–shelf, such software tends to be on the borderline of

outsourced bespoke development. It almost invariably involves

a higher test effort, and may have an uncertain future. Further

difficulties occur when a “best of breed” application procure-

ment approach is pursued to cover all required functionality.

Packages from different vendors — involving different tech-

nologies, APIs, data models — have to be integrated into a

coherent, operational IT environment.

In this paper we discuss aspects of our experience of

reengineering for the enhancement and integration of third–

party and in–house systems over the past ten years. The work

has been carried out at Baring Asset Management (BAM)1

– a global firm providing investment management services in

developed and emerging markets on behalf of institutional,

retail and private clients worldwide. The company operates

1Baring Asset Management is a subsidiary of MassMutual, a leadingdiversified financial services organisation.

from 10 countries, and has around 100 investment profession-

als, covering equity, bond and alternative asset classes.

The software which has been reengineered is written in

SQR (Structured Query Reporter)2. SQR can be described as

a full–blown programming language with built–in relational

data access and reporting capabilities.

A. Related Work

A number of papers ([1], [8]) discuss the renovation of

software in COBOL - a language to which the rather less

known SQR bears some resemblance. A different approach

to a similar problem – plain text report output conversion

into XML – is applied in [5] for the purpose of automated

regression testing in the context of a migration project. A tool–

supported method of wrapping legacy code into web services

is presented in [6].

II. MACHINE–ASSISTED PROGRAM TRANSFORMATION

A. Requirements

A fund management business like BAM relies heavily on

its investment accounting system(s). Such a system records

all security and cash transactions for client accounts and in-

house funds, and produces historic valuations of holdings. It

has a central place in the back office, and provides data

for virtually all front and middle office functions: investment

decision support, order generation and management, mandate

restriction checking, trade confirmation, compliance moni-

toring, risk analysis etc. It also supports other back office

functions: reconciliation with custodian records, performance

measurement, fee calculation, etc. In short, the system needs to

be well integrated, directly or indirectly, with numerous other

applications and services.

Our main investment accounting system is also a source

of regular client valuation packs and other reports. As deliv-

ered by the vendor, the system utilises SQR for reporting.

The roughly 200 SQR reporting programs supplied with our

investment accounting system were written in the mid 1990s,

and produced (printed) plain text output.

2SQR has a long history dating back to the 1980s through a series ofacquisitions. The language now lives on in Oracle’s Hyperion ProductionReporting, and is also embedded in Oracle’s PeopleSoft suite [7].

2011 18th Working Conference on Reverse Engineering

1095-1350/11 $26.00 © 2011 IEEE

DOI 10.1109/WCRE.2011.57

406

...

nominal$face_amt (0, col4)

description

SQR print args field_name

trash

$face_amt (0, col4)

...

trash

$descr (0, col3)

.Map file

...

...

.SQR report program

print $face_amt (0, col4)

Transformed .SQR program

...

print $face_amt (0, col4)

write 1 ’<nominal>’ $face_amt

... ’</nominal>’

Java SQRTransform

.Map file

SQR print args field_name

$descr (0, col3)

Fig. 1. SQR program transformation

Stated simply, the requirement posed to IT in 2000 was

to create a sub–set of 30 reports in electronic document

formats. The main need was for printable Excel output - i.e.

spreadsheets which have a proper layout when printed, and

can therefore fully replace the printed report packs. To give

some sense of the size of this task: the 30 reports amounted

to over 55,000 lines of SQR code.

B. Transforming SQR programs

In order to meet the requirement, we have chosen to write

a program which would parse and transform SQR code. In

summary, SQR programs undergo a machine–assisted transfor-

mation, which extends these programs with new functionality.

The approach is illustrated in a simplified form in Fig. 1.

The SQR transformation program (written in Java) takes an

SQR program and an optional .map file as input, and produces

a transformed (extended) SQR program, and an updated .map

file as output. The original SQR report programs contain

print statements, e.g. “print $face_amt (0, col4)”.

For each such statement, the transformed SQR program con-

tains an additional corresponding “write” statement. When the

transformed SQR program is executed, the print statements

produce report output (as in the original SQR program), while

the write statements produce an XML data file.

The .map file specifies the XML element tag (field name)

to be used for each print statement. The SQR transformer

reads the .map file (if supplied), and creates a new version

of the .map file, in which all unspecified field names are set

to “trash”. The person who applies the program transformation

can then replace “trash” with meaningful names, and rerun the

transformation.

Subsequent changes to the original SQR program (for

example, as delivered in a new version of the investment

accounting system) can be incorporated by re–running the

transformer. Unless there have been changes to the layout (e.g.

print statement arguments), no manual manipulation would be

required. If changes in the original SQR affect report layout

(e.g. if a new column is added), the .map file would need

Fig. 2. Grammar for output XML files

report contents:– <sqr-program> prog name </sqr-program> element+

element:– <page-heading lines=”linesInHeading”> data element+

</page-heading>| <theme-heading type=”string”> data element+

</theme-heading>| <data-row> data element+ comment element∗ </data-row>| <total-row type=”string”> data element+ comment element∗

</total-row>| position statement

data element:– <field name> coord edit stamp∗ field contents </field name>| position statement

coord:– <coord y=”lineNo” x=”columnNo” w=”width”/>

position statement:– <next-listing skipLines=”int” neeLines=”int”/>| <position y=”lineNo” x=”columnNo” w=”width”/>| <new-page/>

edit stamp:– <edit-stamp user id=”string” timestamp=”datetime”>

field contents </edit-stamp>comment element

:– <user-comment> edit stamp+ </user-comment>| <error-message> edit stamp+ </error-message>| <warning-message> edit stamp+ </warning-message>

prog name, field name, field contents:– string

linesInHeading, lineNo, columnNo, width:– int

updating, by replacing any newly generated “trash” tags. In

practice, most SQR source changes have tended to affect

business logic but not layout, leaving the mapping unchanged.

The full transformation is somewhat more complicated than

shown in Fig. 1, and it also applies to other SQR commands

besides print statements: positioning commands, page and

section headings, procedure declarations. The corresponding

XML element tags for these commands can also be added to

.map files as appropriate.

When executed, the transformed (extended) SQR programs

produce XML output. An illustrative grammar of the output

is given in Fig. 2. Logical rows of data are represented by

<data-row> elements. Fields within a row are represented

by <field name> elements, where all field name tags are

specified in the .map file from Fig. 1. Note that, because each

report would have its own set of <field name> tags, a family

of XML schema definitions are required, rather than a single

schema for all reports.

Rows containing section headings and totals are represented

by <theme-heading> and <total-row> elements respec-

tively; headings repeated at the top of each page are enclosed

in <page-heading> elements.

The XML output file also contains enough positioning

information (coord and position statement elements) to allow

reproducing the original report layout.

407

III. USING TRANSFORMED PROGRAMS

A. Use in reporting

The XML files produced by transformed (extended) SQR

programs are processed in a number of ways. We have

implemented all processing in a framework of Java classes.

• Layout and formatting - generic (report–independent)

classes to lay out the XML data in several formats:

printable Excel, Word, CSV, HTML.

• Report editor - generic (report–independent) GUI for

editing selected fields on reports, and for inserting com-

ments. The editor updates the XML files, and creates new

<edit-stamp> and <user-comment> elements.

• Report generator - generic classes for creating XML

data files from other sources of data (e.g. database;

messages / files in different formats; Java objects).

• Report definitions - contain meta–data facilitating the

above functions. For example, the meta–data for each

field name includes positioning in different formats (e.g.

Excel column); labels for creating HTML links between

fields in different reports; data type, format pattern for

the report editor and generator, etc.

• Automated checks - generic and report–specific classes

automating a number of quality control checks which had

previously been performed manually on printed report

packs. The check classes create <error-message> and

<warning-message> elements.

The formatting in MS Office automates Excel and Word

on Windows using a code generation approach from [4]. A

second, multi–platform implementation uses the Apache POI

Java Excel library [3].

The Report Generator classes extend the framework beyond

its initial intended use for processing the XML output of

transformed SQR programs. The generator allows new reports

to be created using SQL, metadata, and Java code where

necessary. The XML output is processed by the other parts

of the framework, just like the output of transformed SQR

programs. As well as in new reports, the generator has been

used to rewrite the last remaining legacy COBOL programs as

part of a reverse engineering and re-implementation project.

This framework of core components has been utilised in a

line of several products, in the context of different business

systems:

• GUI application – using the output of transformed SQR

programs for reporting to institutional clients from our

main investment accounting system. In production since

2001;

• Web application (J2EE) – using the report generator for

multi–language reporting to retail clients and agents. In

production since 2003;

• Server–side applications – using the report generator for

batch reporting in two different systems under Windows

and HP-UX. In production since 2005 and 2006 respec-

tively.

B. Use in data integration

As discussed in Sect. II-A, our investment accounting sys-

tem needs to be connected with a number of other systems and

services. Transformed SQR programs from Sect. II-B create

structured XML output, and can therefore be used as data

producers for integration between systems. The same applies

to the report generator. Where possible, such reuse has been

very beneficial, saving development and test effort. Further-

more, business requirements specifications for data feeds can

be phrased in terms of report output, and no additional analysis

or reverse engineering effort is needed to trace the derivation

of source data.

Data can be in transported between producers and con-

sumers in files or messages. At the consumer end, it can be

processed in various ways. In some cases, e.g. when populating

a staging table, data warehouse, or reporting database, it is

possible to simply insert the content of the XML file into

a database table. To this end, report definition metadata from

Sect. III-A can optionally specify a target database table, and a

target column name for each field. A generic metadata–driven

component then loads XML files into a database.

When using transformed SQR programs as data sources for

integration, we have sometimes found it necessary to include

a few additional items which are not shown on the original

report. We achieve this by manually adding print commands

to the original program, before applying the transformation.

The additions are enclosed in a special type of comment, so

that they are ignored by the SQR compiler / interpreter, but

not by our SQR transformer.

C. Use in Services and Business Processes

Production reporting and integration capabilities at BAM

have been enhanced further since the adoption in 2008 of

the webMethods Business Process Management (BPM) suite

from Software AG. The suite contains a message broker, a

container for (web) services (Integration Server), and a BPEL–

compatible business process execution engine.

We have wrapped the SQR interpreter as a service in

webMethods Integration Server, thus enabling different clients

to execute SQR programs. Such a service is particularly

useful for executing transformed (extended) SQR programs

from Sect. II-B. Similarly, some processing components from

Sect. III-A and III-B have been exposed as services.

An executable business process utilising these and other ser-

vices has been created. It provides automation and workflow

support for client valuation production, and has been been in

regular use since mid–2010.

IV. ANALYSIS OF THE OUTCOME

A. Achievements and Benefits

The code statistics in Table I and II give an idea of the size

of the project, and (albeit indirectly) of the effort involved.

It can be observed from Table I that the number of lines in

new Java code for implementing the SQR source code trans-

formation, including map files, is about 12 times smaller than

the number of lines in the original SQR report programs. The

408

TABLE ITRANSFORMATION–RELATED CODE STATISTICS

Language Code type Files Lines(thousands)

SQR Programs pre–transformation 61 99

Programs post–transformation 61 118

Java Parser for SQR 8 1.9

Transformer for SQR 5 1.5

tab–delimited Map files 61 4.5

TABLE IIENHANCED REPORTING CODE STATISTICS

Language Code type Files Lines(thousands)

Java Layout and formatting 34 8.7

Report editor GUI 19 8.5

Report generator 10 1.9

Automated checks 17 2.8

Utility classes (io, xml) 25 6.3

new code implementing enhanced reporting functionality from

Table II is also relatively small compared to the original SQR

code. Crude as such a comparison may be, it provides some

evidence that the machine–supported transformation approach

is likely to have been more cost–efficient than a complete

re-implementation of the SQR programs. Furthermore, the

new code has not required an understanding of the business

logic in the original programs, nor a reverse engineering of

their specifications. The creation of the map files used in the

transformation is a largely mechanistic process. Changes to

the original SQR programs can be incorporated by rerunning

the transformation, often with little or no update to map files.

Since the original project go-live in early 2001, some 180

changes (checked–in revisions) have been made to the SQR

programs, all of which have been applied by rerunning the

SQR transformation.

B. Performance of transformed programs

Unlike the original SQR programs, transformed (extended)

programs create additional XML output files, and therefore

take longer to run. Table III shows the run rime of a Portfolio

Valuation program3. The bulk of total elapsed run time of

these programs is due to database access and processing on

the data server. Consequently, the total run time overhead of

3The programs were run on a development server under HP-UX, creatinga 200+ page report for 8 portfolios, and XML output containing over 36thousand elements.

TABLE IIIRUN–TIME PERFORMANCE OF TRANSFORMED SQR PROGRAMS

Time Original prog Transformed prog Overhead(sec) Mean Std dev Mean Std dev %

Total elapsed 39.54 0.73 40.37 0.62 2.1

User 5.16 0.11 5.64 0.15 9.4

System 1.71 0.04 1.97 0.05 15

transformed programs due to XML output creation is quite

small (2.1%).

C. Problems and limitations

The work described in this paper has been carried out

in the context of a particular company, with the aim of

meeting specific requirements in a cost–efficient way. The

transformation–based reengineering approach has certain lim-

itations which have necessitated manual restructuring in some

cases. In particular, the parser for SQR used for implementing

the transformation in Fig. 1 is based on a partial grammar

of the SQR language. As a consequence, a small change was

required to one of our SQR programs before the machine–

assisted transformation could be applied.

Post–transformation, SQR programs grow about 20% longer

on average. This became an issue in the case of one re-

port program, which hit an internal limit of the SQR com-

piler/interpreter that was not configurable. To overcome this,

we had to manually restructure the report program before

applying the transformation to it. An editor script was created

so that the changes could be reapplied to future versions of

the original SQR report program.

Overall, the need for manual restructuring has been suf-

ficiently rare and narrow in scope, and has therefore been

acceptable from a pragmatic point of view.

We have not created XML schemata for the output of

our extended SQR programs. Instead the Report definition

metadata from Sect. III-A drives all subsequent processing.

In principle there is no reason why we could not generate

schemata from the report definitions.

D. Choice of approach

The requirement for modernised reporting could have been

met in different ways, for example by:

1) Writing code to parse and transform the text output of

the original reports;

2) Manual “cloning” and modification of the original pro-

grams delivered by the vendor;

3) A complete rewrite in a preferred language;

4) Programmatic transformation or translation of SQR

source code (the approach which was chosen).

Our reasoning in choosing an approach was along the

following lines. Option 1 is akin to screen–scraping; it was

not advisable or practical because of the complex and variable

layout of some of the reports. Option 2 was undesirable not

only because of code duplication, but also because the clones

of third–party programs would have to be maintained in line

with new revisions from the vendor. A software configuration

management system like Perforce would help merge our

codeline with the vendor’s, but the manual effort remains.

This approach had been used previously for modest layout

changes to a couple of the report programs. This option may

have have been appropriate for a smaller number of reports. In

addition to modifying the SQR programs, we would still have

had to implement somehow the required new functionality

for enhanced formatting. Had it been available, an XML

409

generation library like [10] could have provided a possible

way forward.

Option 3 was considered prohibitively expensive, involving

substantial effort for analysis, specification, development, and

testing; it is reliant on a good understanding and documenta-

tion (which we lacked) of the internals of a third–party system

beyond published interfaces. Furthermore, there were technical

limitations: our investment accounting system was delivered

with a custom build of the SQR compiler / interpreter, with

application libraries statically linked. We would not have been

able to link those libraries to other software, e.g. a different

reporting or integration tool.

Option 4 was initially regarded with considerable scepti-

cism at the time, since there was no experience of program

transformation within the organisation. But there was enough

experience of the other approaches, and an appreciation of the

difficulties associated with those, to put the program transfor-

mation option in context, and make it a possible choice. Some

prototyping, comparative estimates and extensive discussions

proved enough to convince decision makers in IT and the

business to give this option a go.

With Option 4 we have had to understand the operational

semantics of SQR print and positioning commands from the

language documentation, and “reverse engineer” some missing

details. This was arguably a smaller exercise than reverse

engineering the specification of each report (for Option 3).

If we had to solve the same renovation problem today,

would we adopt a different approach? One big difference is

that now we have the complete source code of the investment

accounting system. We have gone through a code adoption

process, and have a much better understanding of the system

overall. The technical limitation associated with Option 3

no longer applies, because we can link application libraries

directly to other tools / languages besides SQR. Even so, we

find it cost–efficient to maintain existing SQR reports and

reapply the transformation when changes are made.

V. CONCLUSION

The program transformation approach has enabled BAM

to modernise our client reporting without a significant re-

implementation cost, and to meet continuously evolving busi-

ness requirements.

It can be argued that a program transformation–based ap-

proach would yield better results if modern reengineering tools

are applied, instead of using a general–purpose programming

language such as Java. Since the work discussed in this paper

was started, there has been significant progress in the theory

and practice of reverse engineering and reengineering, with an

emphasis on language–independent techniques – [1], [2]. It is

appropriate to consider whether the correct choice today would

be to use a tool like the Meta–environment [9]. With support

for generalised LR parsing and conditional term rewriting,

such an environment should make program transformation a

more widely applicable method for software engineering.

The main contribution of our work is to demonstrate prac-

tically that such an approach can be effective in the context

of a medium–sized company in a non–IT industry. A sizeable

legacy of code in SQR and other similar languages exists, to

which this renovation approach is appropriate. The approach

can be applied by an organisation to source code which is

available, but is not necessarily controlled or maintained by

that organisation.

REFERENCES

[1] A. van Deursen, P. Klint, and C. Verhoef. Research issues in softwarerenovation. In J.-P. Finance, editor, Fundamental Approaches to SoftwareEngineering (FASE ’99), volume 1577 of Lecture Notes in ComputerScience, pages 1–21. Springer-Verlag, 1999.

[2] M. Di Penta, M. Neteler, G. Antoniol, and E. Merlo. A language-independent software renovation framework. Journal of Systems andSoftware, 77(3):225–240, 2005.

[3] The Apache Software Foundation. Apache POI - the Java API forMicrosoft Documents. http://poi.apache.org.

[4] S. Mintchev and V. Getov. Automatic binding of native scientificlibraries to Java. In Proceedings of ISCOPE, pages 129–136, 1997.Springer LNCS 1343.

[5] H.M. Sneed. Reengineering reports. In Proceedings of the 11th WorkingConference on Reverse Engineering, pages 17–26. IEEE ComputerSociety, 2004.

[6] H.M. Sneed. Integrating legacy software into a service oriented archi-tecture. In Proceedings of the Conference on Software Maintenance andReengineering, pages 3–14. IEEE Computer Society, 2006.

[7] SparkPath Technologies, Inc. SQR Programming Language TechnicalInformation. http://www.sqr-info.com.

[8] M. Van Den Brand, A. Sellink, and C. Verhoef. Generation of com-ponents for software renovation factories from context-free grammars.Science of Computer Programming, 36(2-3):209–266, 2000.

[9] MGJ van den Brand, M. Bruntink, GR Economopoulos, HA de Jong,P. Klint, T. Kooiker, T. van der Storm, and JJ Vinju. Using themeta-environment for maintenance and renovation. In 11th EuropeanConference on Software Maintenance and Reengineering, CSMR’07.,pages 331–332. IEEE, 2007.

[10] David Vandiver. SQR2XML: Library of functions tocreate Excel documents from SQR and PeopleCode.http://sourceforge.net/projects/sqr2xml.

410