the center for advanced research in software engineering (arise) the university of texas at austin...
TRANSCRIPT
The University of Texas at Austin
The Center for Advanced Research In Software Engineering (ARISE)
Reengineering of Large-Scale Polylingual Systems
Mark Grechanik, Dewayne E. Perry, and Don Batory
The University of Texas at Austin
2The Center for Advanced Research In Software
Engineering (ARISE)
Polylingual Systems
Polylingual systems consist of interoperating programs (or COTS components) that are written in two or more languages or are run on two or more platforms Native type system is the type system of a host
language in which a program is written A program written in a host language interoperates with
a program based on a Foreign Type System (FTS)
Pn Pk
Pn Pk
Pn Pk
The University of Texas at Austin
3The Center for Advanced Research In Software
Engineering (ARISE)
Examples of Polylingual Systems A C++ program and an EJB interoperate
PC++ PJava
PC++ PJava
PC++ PJava
A C# program and a Python program interoperate
PC# PPython
PC# PPython
PC# PPython
The University of Texas at Austin
4The Center for Advanced Research In Software
Engineering (ARISE)
Large-Scale Polylingual Systems
P1
P2
P3
P4
…
Pn
Polylingual systems can be represented as graphs of interoperating programs Circles mean programs Arrows mean interoperating APIs
For a clique with n programs, the complexity of APIs used to interoperate programs is O(n2)
We need a scalable approach for designing,
implementing, andmaintaining large-scale
polylingual systems!
The University of Texas at Austin
5The Center for Advanced Research In Software
Engineering (ARISE)
Assumptions
Reflection is available for all platforms
The cost of reflection is insignificant Hardware is powerful and cheap Cost of network communications outweighs the
cost of reflection the order of magnitude
Polylingual systems are based on recursive type systems
The University of Texas at Austin
6The Center for Advanced Research In Software
Engineering (ARISE)
Core Abstraction
Int n = R[“CEO”][“CTO”][“Geeks”]
CEO
CFO CTO
Test Geek
s
NameBonus
NameSalary
Geeks
CEO
CTO
Geeks
The University of Texas at Austin
7The Center for Advanced Research In Software
Engineering (ARISE)
Operations On Reification Operators
Copy Creates a copy of an element or attribute and adds it to its new location. All properties of an element or an attribute are cloned including all nested elements
Move It is identical to the copy operation except for the automatic removal of the original element or attribute upon completion of copying
Add It appends elements and attributes under a given path
Remove It removes elements and attributes from the given path. If a removed element contains nested elements then the entire branch of the graph under the removed element is deleted
Relational Compares graphs and their elements with constants, variables, or other graphs
Logic set Computes various logic set operations such as intersection, union, cartesian product, complement, and difference
Composition Composes two reification operators
The University of Texas at Austin
8The Center for Advanced Research In Software
Engineering (ARISE)
Our Solution: Reification Object-Oriented Framework (ROOF) Basic idea: each component in a polylingual system is
represented as a graph of objects and a uniform set of APIs is provided to navigate and manipulate these objects
We use the generality of graphs to develop a language and platform-independent solution for polylingual systems
Reification Object-Oriented Framework Reify objects from an FTS to the host language Remote objects become first-class objects Reification is based on reflection ROOF hides all the complexity that programmers have to deal
with today
The University of Texas at Austin
9The Center for Advanced Research In Software
Engineering (ARISE)
Birds-Eye View of the ROOF
CORBA .Net XML HTML DBMS
Reification Object-Oriented Framework (ROOF)
Foreign Object Reification Language (FOREL)
The University of Texas at Austin
10The Center for Advanced Research In Software
Engineering (ARISE)
…String s;s = R[“H2”][“B”][“FONT”];…
C++ Program
Reification Mechanism
<H2> <B> <FONT
size=“2"> Hello
World! </ FONT > </B></H2>
HTML Parser
The University of Texas at Austin
11The Center for Advanced Research In Software
Engineering (ARISE)
Reification Mechanism
RHTMLC++
<H2> <B> <FONT
size=“2"> Hello
World! </ FONT > </B></H2>
HTML Parser
from
…String s;s = R[“H2”][“B”][“FONT”];…
C++ Program
to
The University of Texas at Austin
12The Center for Advanced Research In Software
Engineering (ARISE)
Reification Mechanism
RHTMLC++
<H2> <B> <FONT
size=“2"> Hello
World! </ FONT > </B></H2>
HTML Parser
…String s;s = R[“H2”][“B”][“FONT”];…
C++ Program
The University of Texas at Austin
13The Center for Advanced Research In Software
Engineering (ARISE)
Reification Mechanism
RHTMLC++
<H2> <B> <FONT
size=“2"> Hello
World! </ FONT > </B></H2>
HTML Parser
…String s;s = R[“H2”][“B”][“FONT”];…
C++ Program
R
The University of Texas at Austin
14The Center for Advanced Research In Software
Engineering (ARISE)
Reification Mechanism
RHTMLC++
<H2> <B> <FONT
size=“2"> Hello
World! </ FONT > </B></H2>
HTML Parser
H2BFONT
Hello World! …
String s;s = R[“H2”][“B”][“FONT”];…
C++ Program
H2 B FONTS
The University of Texas at Austin
15The Center for Advanced Research In Software
Engineering (ARISE)
…String s;s = R[“JCls”][“GetString”];…
C# Program
Reification Mechanism
class JCls{ String GetString() { return( new String( “Hello World!”)); }}
Java Virtual Machine
The University of Texas at Austin
16The Center for Advanced Research In Software
Engineering (ARISE)
RJavaC#
class JCls{ String GetString() { return( new String( “Hello World!”)); }}
Java Virtual Machine
Reification Mechanism
from
…String s;s = R[“JCls”][“GetString”];…
C# Program
to
The University of Texas at Austin
17The Center for Advanced Research In Software
Engineering (ARISE)
Reification Mechanism
RJavaC#
class JCls{ String GetString() { return( new String( “Hello World!”)); }}
Java Virtual Machine
…String s;s = R[“JCls”][“GetString”];…
C# Program
The University of Texas at Austin
18The Center for Advanced Research In Software
Engineering (ARISE)
Reification Mechanism
RJavaC#
class JCls{ String GetString() { return( new String( “Hello World!”)); }}
Java Virtual Machine
…String s;s = R[“JCls”][“GetString”];…
C# Program
R
The University of Texas at Austin
19The Center for Advanced Research In Software
Engineering (ARISE)
Reification Mechanism
RJavaC#
class JCls{ String GetString() { return( new String( “Hello World!”)); }}
Java Virtual Machine
…String s;s = R[“JCls”][“GetString”];…
C# Program
JCls
GetString
Hello World!
JCls GetString
S
The University of Texas at Austin
20The Center for Advanced Research In Software
Engineering (ARISE)
Properties of the ROOF
Our solution does not introduce Additional type systems Hard-to-learn API Special constraints that affect programmer’s
decisions to share objects
ROOF allows programmers to Avoid using any naming mechanisms Type check foreign objects at compile time Other reasons
The University of Texas at Austin
21The Center for Advanced Research In Software
Engineering (ARISE)
FORTRESS
We exploit properties of FOREL-based code to recover high-level design of polylingual systems with a high degree of automation
Our solution is FOReign Types Reverse Engineering Semantic System (FORTRESS) Normalize code to conform to FOREL grammar Analyze FOREL-based code using program
analysis techniques (CFA and DFA) Infer schemas that describe FTS models and
operations executed against them
The University of Texas at Austin
22The Center for Advanced Research In Software
Engineering (ARISE)
GUIVisualization
Engine
FORTRESS Process
Normalizedcode
CompilerFront end
ProgramAnalysis
SchemaInference
The University of Texas at Austin
23The Center for Advanced Research In Software
Engineering (ARISE)
FTS RE Algorithm
1) Parse the source code and build an AST
2) Build a control flow graph
3) Build a data flow graph
4) For each branch in the control flow graph doa) Detect reachability of statements accessing and
manipulating reified types
b) Create schema definitions from reified types
c) Translate operations on reified type instances to operations of the schema definition elements
d) Output the schema and operations on its instances
5) End For
ProgramAnalysis
SchemaInference
OutputGeneration
The University of Texas at Austin
24The Center for Advanced Research In Software
Engineering (ARISE)
Schema Inference
SELECT u.Name, c.Course FROM User u, Courses c WHERE u.ID = c.ID; Two tables: User and Courses Attributes Name and ID in User table Attributes Course and ID in Course table Declaration of attribute ID in both tables is the
same or compatible
The University of Texas at Austin
25The Center for Advanced Research In Software
Engineering (ARISE)
Schema Inference
User
NameID
Courses
CourseID
The University of Texas at Austin
26The Center for Advanced Research In Software
Engineering (ARISE)
Schema Inference in FTSs
ReificationOperator R;float var = 100000.0;R[“CEO”][“CTO”](“Salary”) = var;
What can we infer from this statement? The structure of a branch of the data flow
Composite type CEO of some FTS Attribute Salary of type CTO The type of this attribute and a value that it is
assigned in this branch
The University of Texas at Austin
27The Center for Advanced Research In Software
Engineering (ARISE)
Schema Inference in FORTS
CEO
CTO Salary
R[“CEO”][“CTO”](“Salary”) = var;CEO CTO Salary
The University of Texas at Austin
28The Center for Advanced Research In Software
Engineering (ARISE)
Synergy
Program analysis and schema inference engine is a powerful combination Create the schemas that reflect the semistructured data
operated by the code Relate different FTSs by analyzing a single FTS program Create high-level design by relating actions to schemas
rather than variables and functions
I J
Q
The University of Texas at Austin
29The Center for Advanced Research In Software
Engineering (ARISE)
Output Generation
Outputs schemas describing FTSs instructions in readable format that manipulate
instances of schemas Visualization Tool
Presents a single high-level view of FTSs Models program execution and visualizes its
aspects
The University of Texas at Austin
30The Center for Advanced Research In Software
Engineering (ARISE)
FORTRESS Architecture
FORELcode
CompilerFront end
Control FlowAnalyzer
Data FlowAnalyzer
SchemaInference
Engine
Visu
aliza
tion
Driv
er
FORTRESS
Elapsed time: 2mins 27 sec
Navigate to node
GUIAST
The University of Texas at Austin
31The Center for Advanced Research In Software
Engineering (ARISE)
Conclusion
We show how the ROOF serves the underlying mechanism enabling the verification of large-scale polylingual systems Reduce the complexity from O(n2) to 1 Provide uniform API for graph navigation and manipulation with
precise semantics assigned to operations
Enable an effective reverse engineering process
Removes pain associated with understanding of legacy software
No existing solution addresses this problem