[ieee 2011 18th working conference on reverse engineering (wcre) - limerick, ireland...
TRANSCRIPT
Renovation by Machine–Assisted ProgramTransformation in Production Reporting and
Integration
Sava Mintchev
Baring Asset Management, 155 Bishopsgate, London EC2M 3XY, UK
Abstract—In corporate IT, subject areas like Production Re-porting and Enterprise Application Integration are routinelyconsidered in isolation. Needs are often met by purchasingseparate product suites or packages, which can be incompatible,and contain unused overlapping functionality. In this paper wediscuss our experience of applying a more holistic approach. Welook at how purchased software can be extended in–house withthe help of a program transformation technique, and can thenbe utilised in a service–oriented architecture for the purposes ofinformation retrieval, data and process integration. By reusingsoftware components for reporting and integration purposes, wehave been able to realise savings in all phases of the softwarelifecycle.
Keywords-Program transformation; SQR; production report-ing; enterprise application integration; service architecture
I. INTRODUCTION
Most organisations rely on commercial off–the–shelf soft-
ware to support their business. A “buy not build” approach
makes perfect sense for commodity products which have
thousands or millions of users, and have alternatives provided
by a number of competing vendors. Such products are well
tested and supported, and have a clear development roadmap.
Problems usually arise when buying highly specialised
products with few users and other clients. While nominally
off–the–shelf, such software tends to be on the borderline of
outsourced bespoke development. It almost invariably involves
a higher test effort, and may have an uncertain future. Further
difficulties occur when a “best of breed” application procure-
ment approach is pursued to cover all required functionality.
Packages from different vendors — involving different tech-
nologies, APIs, data models — have to be integrated into a
coherent, operational IT environment.
In this paper we discuss aspects of our experience of
reengineering for the enhancement and integration of third–
party and in–house systems over the past ten years. The work
has been carried out at Baring Asset Management (BAM)1
– a global firm providing investment management services in
developed and emerging markets on behalf of institutional,
retail and private clients worldwide. The company operates
1Baring Asset Management is a subsidiary of MassMutual, a leadingdiversified financial services organisation.
from 10 countries, and has around 100 investment profession-
als, covering equity, bond and alternative asset classes.
The software which has been reengineered is written in
SQR (Structured Query Reporter)2. SQR can be described as
a full–blown programming language with built–in relational
data access and reporting capabilities.
A. Related Work
A number of papers ([1], [8]) discuss the renovation of
software in COBOL - a language to which the rather less
known SQR bears some resemblance. A different approach
to a similar problem – plain text report output conversion
into XML – is applied in [5] for the purpose of automated
regression testing in the context of a migration project. A tool–
supported method of wrapping legacy code into web services
is presented in [6].
II. MACHINE–ASSISTED PROGRAM TRANSFORMATION
A. Requirements
A fund management business like BAM relies heavily on
its investment accounting system(s). Such a system records
all security and cash transactions for client accounts and in-
house funds, and produces historic valuations of holdings. It
has a central place in the back office, and provides data
for virtually all front and middle office functions: investment
decision support, order generation and management, mandate
restriction checking, trade confirmation, compliance moni-
toring, risk analysis etc. It also supports other back office
functions: reconciliation with custodian records, performance
measurement, fee calculation, etc. In short, the system needs to
be well integrated, directly or indirectly, with numerous other
applications and services.
Our main investment accounting system is also a source
of regular client valuation packs and other reports. As deliv-
ered by the vendor, the system utilises SQR for reporting.
The roughly 200 SQR reporting programs supplied with our
investment accounting system were written in the mid 1990s,
and produced (printed) plain text output.
2SQR has a long history dating back to the 1980s through a series ofacquisitions. The language now lives on in Oracle’s Hyperion ProductionReporting, and is also embedded in Oracle’s PeopleSoft suite [7].
2011 18th Working Conference on Reverse Engineering
1095-1350/11 $26.00 © 2011 IEEE
DOI 10.1109/WCRE.2011.57
406
...
nominal$face_amt (0, col4)
description
SQR print args field_name
trash
$face_amt (0, col4)
...
trash
$descr (0, col3)
.Map file
...
...
.SQR report program
print $face_amt (0, col4)
Transformed .SQR program
...
print $face_amt (0, col4)
write 1 ’<nominal>’ $face_amt
... ’</nominal>’
Java SQRTransform
.Map file
SQR print args field_name
$descr (0, col3)
Fig. 1. SQR program transformation
Stated simply, the requirement posed to IT in 2000 was
to create a sub–set of 30 reports in electronic document
formats. The main need was for printable Excel output - i.e.
spreadsheets which have a proper layout when printed, and
can therefore fully replace the printed report packs. To give
some sense of the size of this task: the 30 reports amounted
to over 55,000 lines of SQR code.
B. Transforming SQR programs
In order to meet the requirement, we have chosen to write
a program which would parse and transform SQR code. In
summary, SQR programs undergo a machine–assisted transfor-
mation, which extends these programs with new functionality.
The approach is illustrated in a simplified form in Fig. 1.
The SQR transformation program (written in Java) takes an
SQR program and an optional .map file as input, and produces
a transformed (extended) SQR program, and an updated .map
file as output. The original SQR report programs contain
print statements, e.g. “print $face_amt (0, col4)”.
For each such statement, the transformed SQR program con-
tains an additional corresponding “write” statement. When the
transformed SQR program is executed, the print statements
produce report output (as in the original SQR program), while
the write statements produce an XML data file.
The .map file specifies the XML element tag (field name)
to be used for each print statement. The SQR transformer
reads the .map file (if supplied), and creates a new version
of the .map file, in which all unspecified field names are set
to “trash”. The person who applies the program transformation
can then replace “trash” with meaningful names, and rerun the
transformation.
Subsequent changes to the original SQR program (for
example, as delivered in a new version of the investment
accounting system) can be incorporated by re–running the
transformer. Unless there have been changes to the layout (e.g.
print statement arguments), no manual manipulation would be
required. If changes in the original SQR affect report layout
(e.g. if a new column is added), the .map file would need
Fig. 2. Grammar for output XML files
report contents:– <sqr-program> prog name </sqr-program> element+
element:– <page-heading lines=”linesInHeading”> data element+
</page-heading>| <theme-heading type=”string”> data element+
</theme-heading>| <data-row> data element+ comment element∗ </data-row>| <total-row type=”string”> data element+ comment element∗
</total-row>| position statement
data element:– <field name> coord edit stamp∗ field contents </field name>| position statement
coord:– <coord y=”lineNo” x=”columnNo” w=”width”/>
position statement:– <next-listing skipLines=”int” neeLines=”int”/>| <position y=”lineNo” x=”columnNo” w=”width”/>| <new-page/>
edit stamp:– <edit-stamp user id=”string” timestamp=”datetime”>
field contents </edit-stamp>comment element
:– <user-comment> edit stamp+ </user-comment>| <error-message> edit stamp+ </error-message>| <warning-message> edit stamp+ </warning-message>
prog name, field name, field contents:– string
linesInHeading, lineNo, columnNo, width:– int
updating, by replacing any newly generated “trash” tags. In
practice, most SQR source changes have tended to affect
business logic but not layout, leaving the mapping unchanged.
The full transformation is somewhat more complicated than
shown in Fig. 1, and it also applies to other SQR commands
besides print statements: positioning commands, page and
section headings, procedure declarations. The corresponding
XML element tags for these commands can also be added to
.map files as appropriate.
When executed, the transformed (extended) SQR programs
produce XML output. An illustrative grammar of the output
is given in Fig. 2. Logical rows of data are represented by
<data-row> elements. Fields within a row are represented
by <field name> elements, where all field name tags are
specified in the .map file from Fig. 1. Note that, because each
report would have its own set of <field name> tags, a family
of XML schema definitions are required, rather than a single
schema for all reports.
Rows containing section headings and totals are represented
by <theme-heading> and <total-row> elements respec-
tively; headings repeated at the top of each page are enclosed
in <page-heading> elements.
The XML output file also contains enough positioning
information (coord and position statement elements) to allow
reproducing the original report layout.
407
III. USING TRANSFORMED PROGRAMS
A. Use in reporting
The XML files produced by transformed (extended) SQR
programs are processed in a number of ways. We have
implemented all processing in a framework of Java classes.
• Layout and formatting - generic (report–independent)
classes to lay out the XML data in several formats:
printable Excel, Word, CSV, HTML.
• Report editor - generic (report–independent) GUI for
editing selected fields on reports, and for inserting com-
ments. The editor updates the XML files, and creates new
<edit-stamp> and <user-comment> elements.
• Report generator - generic classes for creating XML
data files from other sources of data (e.g. database;
messages / files in different formats; Java objects).
• Report definitions - contain meta–data facilitating the
above functions. For example, the meta–data for each
field name includes positioning in different formats (e.g.
Excel column); labels for creating HTML links between
fields in different reports; data type, format pattern for
the report editor and generator, etc.
• Automated checks - generic and report–specific classes
automating a number of quality control checks which had
previously been performed manually on printed report
packs. The check classes create <error-message> and
<warning-message> elements.
The formatting in MS Office automates Excel and Word
on Windows using a code generation approach from [4]. A
second, multi–platform implementation uses the Apache POI
Java Excel library [3].
The Report Generator classes extend the framework beyond
its initial intended use for processing the XML output of
transformed SQR programs. The generator allows new reports
to be created using SQL, metadata, and Java code where
necessary. The XML output is processed by the other parts
of the framework, just like the output of transformed SQR
programs. As well as in new reports, the generator has been
used to rewrite the last remaining legacy COBOL programs as
part of a reverse engineering and re-implementation project.
This framework of core components has been utilised in a
line of several products, in the context of different business
systems:
• GUI application – using the output of transformed SQR
programs for reporting to institutional clients from our
main investment accounting system. In production since
2001;
• Web application (J2EE) – using the report generator for
multi–language reporting to retail clients and agents. In
production since 2003;
• Server–side applications – using the report generator for
batch reporting in two different systems under Windows
and HP-UX. In production since 2005 and 2006 respec-
tively.
B. Use in data integration
As discussed in Sect. II-A, our investment accounting sys-
tem needs to be connected with a number of other systems and
services. Transformed SQR programs from Sect. II-B create
structured XML output, and can therefore be used as data
producers for integration between systems. The same applies
to the report generator. Where possible, such reuse has been
very beneficial, saving development and test effort. Further-
more, business requirements specifications for data feeds can
be phrased in terms of report output, and no additional analysis
or reverse engineering effort is needed to trace the derivation
of source data.
Data can be in transported between producers and con-
sumers in files or messages. At the consumer end, it can be
processed in various ways. In some cases, e.g. when populating
a staging table, data warehouse, or reporting database, it is
possible to simply insert the content of the XML file into
a database table. To this end, report definition metadata from
Sect. III-A can optionally specify a target database table, and a
target column name for each field. A generic metadata–driven
component then loads XML files into a database.
When using transformed SQR programs as data sources for
integration, we have sometimes found it necessary to include
a few additional items which are not shown on the original
report. We achieve this by manually adding print commands
to the original program, before applying the transformation.
The additions are enclosed in a special type of comment, so
that they are ignored by the SQR compiler / interpreter, but
not by our SQR transformer.
C. Use in Services and Business Processes
Production reporting and integration capabilities at BAM
have been enhanced further since the adoption in 2008 of
the webMethods Business Process Management (BPM) suite
from Software AG. The suite contains a message broker, a
container for (web) services (Integration Server), and a BPEL–
compatible business process execution engine.
We have wrapped the SQR interpreter as a service in
webMethods Integration Server, thus enabling different clients
to execute SQR programs. Such a service is particularly
useful for executing transformed (extended) SQR programs
from Sect. II-B. Similarly, some processing components from
Sect. III-A and III-B have been exposed as services.
An executable business process utilising these and other ser-
vices has been created. It provides automation and workflow
support for client valuation production, and has been been in
regular use since mid–2010.
IV. ANALYSIS OF THE OUTCOME
A. Achievements and Benefits
The code statistics in Table I and II give an idea of the size
of the project, and (albeit indirectly) of the effort involved.
It can be observed from Table I that the number of lines in
new Java code for implementing the SQR source code trans-
formation, including map files, is about 12 times smaller than
the number of lines in the original SQR report programs. The
408
TABLE ITRANSFORMATION–RELATED CODE STATISTICS
Language Code type Files Lines(thousands)
SQR Programs pre–transformation 61 99
Programs post–transformation 61 118
Java Parser for SQR 8 1.9
Transformer for SQR 5 1.5
tab–delimited Map files 61 4.5
TABLE IIENHANCED REPORTING CODE STATISTICS
Language Code type Files Lines(thousands)
Java Layout and formatting 34 8.7
Report editor GUI 19 8.5
Report generator 10 1.9
Automated checks 17 2.8
Utility classes (io, xml) 25 6.3
new code implementing enhanced reporting functionality from
Table II is also relatively small compared to the original SQR
code. Crude as such a comparison may be, it provides some
evidence that the machine–supported transformation approach
is likely to have been more cost–efficient than a complete
re-implementation of the SQR programs. Furthermore, the
new code has not required an understanding of the business
logic in the original programs, nor a reverse engineering of
their specifications. The creation of the map files used in the
transformation is a largely mechanistic process. Changes to
the original SQR programs can be incorporated by rerunning
the transformation, often with little or no update to map files.
Since the original project go-live in early 2001, some 180
changes (checked–in revisions) have been made to the SQR
programs, all of which have been applied by rerunning the
SQR transformation.
B. Performance of transformed programs
Unlike the original SQR programs, transformed (extended)
programs create additional XML output files, and therefore
take longer to run. Table III shows the run rime of a Portfolio
Valuation program3. The bulk of total elapsed run time of
these programs is due to database access and processing on
the data server. Consequently, the total run time overhead of
3The programs were run on a development server under HP-UX, creatinga 200+ page report for 8 portfolios, and XML output containing over 36thousand elements.
TABLE IIIRUN–TIME PERFORMANCE OF TRANSFORMED SQR PROGRAMS
Time Original prog Transformed prog Overhead(sec) Mean Std dev Mean Std dev %
Total elapsed 39.54 0.73 40.37 0.62 2.1
User 5.16 0.11 5.64 0.15 9.4
System 1.71 0.04 1.97 0.05 15
transformed programs due to XML output creation is quite
small (2.1%).
C. Problems and limitations
The work described in this paper has been carried out
in the context of a particular company, with the aim of
meeting specific requirements in a cost–efficient way. The
transformation–based reengineering approach has certain lim-
itations which have necessitated manual restructuring in some
cases. In particular, the parser for SQR used for implementing
the transformation in Fig. 1 is based on a partial grammar
of the SQR language. As a consequence, a small change was
required to one of our SQR programs before the machine–
assisted transformation could be applied.
Post–transformation, SQR programs grow about 20% longer
on average. This became an issue in the case of one re-
port program, which hit an internal limit of the SQR com-
piler/interpreter that was not configurable. To overcome this,
we had to manually restructure the report program before
applying the transformation to it. An editor script was created
so that the changes could be reapplied to future versions of
the original SQR report program.
Overall, the need for manual restructuring has been suf-
ficiently rare and narrow in scope, and has therefore been
acceptable from a pragmatic point of view.
We have not created XML schemata for the output of
our extended SQR programs. Instead the Report definition
metadata from Sect. III-A drives all subsequent processing.
In principle there is no reason why we could not generate
schemata from the report definitions.
D. Choice of approach
The requirement for modernised reporting could have been
met in different ways, for example by:
1) Writing code to parse and transform the text output of
the original reports;
2) Manual “cloning” and modification of the original pro-
grams delivered by the vendor;
3) A complete rewrite in a preferred language;
4) Programmatic transformation or translation of SQR
source code (the approach which was chosen).
Our reasoning in choosing an approach was along the
following lines. Option 1 is akin to screen–scraping; it was
not advisable or practical because of the complex and variable
layout of some of the reports. Option 2 was undesirable not
only because of code duplication, but also because the clones
of third–party programs would have to be maintained in line
with new revisions from the vendor. A software configuration
management system like Perforce would help merge our
codeline with the vendor’s, but the manual effort remains.
This approach had been used previously for modest layout
changes to a couple of the report programs. This option may
have have been appropriate for a smaller number of reports. In
addition to modifying the SQR programs, we would still have
had to implement somehow the required new functionality
for enhanced formatting. Had it been available, an XML
409
generation library like [10] could have provided a possible
way forward.
Option 3 was considered prohibitively expensive, involving
substantial effort for analysis, specification, development, and
testing; it is reliant on a good understanding and documenta-
tion (which we lacked) of the internals of a third–party system
beyond published interfaces. Furthermore, there were technical
limitations: our investment accounting system was delivered
with a custom build of the SQR compiler / interpreter, with
application libraries statically linked. We would not have been
able to link those libraries to other software, e.g. a different
reporting or integration tool.
Option 4 was initially regarded with considerable scepti-
cism at the time, since there was no experience of program
transformation within the organisation. But there was enough
experience of the other approaches, and an appreciation of the
difficulties associated with those, to put the program transfor-
mation option in context, and make it a possible choice. Some
prototyping, comparative estimates and extensive discussions
proved enough to convince decision makers in IT and the
business to give this option a go.
With Option 4 we have had to understand the operational
semantics of SQR print and positioning commands from the
language documentation, and “reverse engineer” some missing
details. This was arguably a smaller exercise than reverse
engineering the specification of each report (for Option 3).
If we had to solve the same renovation problem today,
would we adopt a different approach? One big difference is
that now we have the complete source code of the investment
accounting system. We have gone through a code adoption
process, and have a much better understanding of the system
overall. The technical limitation associated with Option 3
no longer applies, because we can link application libraries
directly to other tools / languages besides SQR. Even so, we
find it cost–efficient to maintain existing SQR reports and
reapply the transformation when changes are made.
V. CONCLUSION
The program transformation approach has enabled BAM
to modernise our client reporting without a significant re-
implementation cost, and to meet continuously evolving busi-
ness requirements.
It can be argued that a program transformation–based ap-
proach would yield better results if modern reengineering tools
are applied, instead of using a general–purpose programming
language such as Java. Since the work discussed in this paper
was started, there has been significant progress in the theory
and practice of reverse engineering and reengineering, with an
emphasis on language–independent techniques – [1], [2]. It is
appropriate to consider whether the correct choice today would
be to use a tool like the Meta–environment [9]. With support
for generalised LR parsing and conditional term rewriting,
such an environment should make program transformation a
more widely applicable method for software engineering.
The main contribution of our work is to demonstrate prac-
tically that such an approach can be effective in the context
of a medium–sized company in a non–IT industry. A sizeable
legacy of code in SQR and other similar languages exists, to
which this renovation approach is appropriate. The approach
can be applied by an organisation to source code which is
available, but is not necessarily controlled or maintained by
that organisation.
REFERENCES
[1] A. van Deursen, P. Klint, and C. Verhoef. Research issues in softwarerenovation. In J.-P. Finance, editor, Fundamental Approaches to SoftwareEngineering (FASE ’99), volume 1577 of Lecture Notes in ComputerScience, pages 1–21. Springer-Verlag, 1999.
[2] M. Di Penta, M. Neteler, G. Antoniol, and E. Merlo. A language-independent software renovation framework. Journal of Systems andSoftware, 77(3):225–240, 2005.
[3] The Apache Software Foundation. Apache POI - the Java API forMicrosoft Documents. http://poi.apache.org.
[4] S. Mintchev and V. Getov. Automatic binding of native scientificlibraries to Java. In Proceedings of ISCOPE, pages 129–136, 1997.Springer LNCS 1343.
[5] H.M. Sneed. Reengineering reports. In Proceedings of the 11th WorkingConference on Reverse Engineering, pages 17–26. IEEE ComputerSociety, 2004.
[6] H.M. Sneed. Integrating legacy software into a service oriented archi-tecture. In Proceedings of the Conference on Software Maintenance andReengineering, pages 3–14. IEEE Computer Society, 2006.
[7] SparkPath Technologies, Inc. SQR Programming Language TechnicalInformation. http://www.sqr-info.com.
[8] M. Van Den Brand, A. Sellink, and C. Verhoef. Generation of com-ponents for software renovation factories from context-free grammars.Science of Computer Programming, 36(2-3):209–266, 2000.
[9] MGJ van den Brand, M. Bruntink, GR Economopoulos, HA de Jong,P. Klint, T. Kooiker, T. van der Storm, and JJ Vinju. Using themeta-environment for maintenance and renovation. In 11th EuropeanConference on Software Maintenance and Reengineering, CSMR’07.,pages 331–332. IEEE, 2007.
[10] David Vandiver. SQR2XML: Library of functions tocreate Excel documents from SQR and PeopleCode.http://sourceforge.net/projects/sqr2xml.
410