the center for advanced research in software engineering (arise) the university of texas at austin...

31
The University of Texas at Austin The Center for Advanced Research In Software Engineering (ARISE) Reengineering of Large-Scale Polylingual Systems Mark Grechanik, Dewayne E. Perry, and Don Batory

Upload: rosalind-stokes

Post on 11-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

The Center for Advanced Research In Software Engineering (ARISE)

Reengineering of Large-Scale Polylingual Systems

Mark Grechanik, Dewayne E. Perry, and Don Batory

Page 2: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

2The Center for Advanced Research In Software

Engineering (ARISE)

Polylingual Systems

Polylingual systems consist of interoperating programs (or COTS components) that are written in two or more languages or are run on two or more platforms Native type system is the type system of a host

language in which a program is written A program written in a host language interoperates with

a program based on a Foreign Type System (FTS)

Pn Pk

Pn Pk

Pn Pk

Page 3: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

3The Center for Advanced Research In Software

Engineering (ARISE)

Examples of Polylingual Systems A C++ program and an EJB interoperate

PC++ PJava

PC++ PJava

PC++ PJava

A C# program and a Python program interoperate

PC# PPython

PC# PPython

PC# PPython

Page 4: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

4The Center for Advanced Research In Software

Engineering (ARISE)

Large-Scale Polylingual Systems

P1

P2

P3

P4

Pn

Polylingual systems can be represented as graphs of interoperating programs Circles mean programs Arrows mean interoperating APIs

For a clique with n programs, the complexity of APIs used to interoperate programs is O(n2)

We need a scalable approach for designing,

implementing, andmaintaining large-scale

polylingual systems!

Page 5: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

5The Center for Advanced Research In Software

Engineering (ARISE)

Assumptions

Reflection is available for all platforms

The cost of reflection is insignificant Hardware is powerful and cheap Cost of network communications outweighs the

cost of reflection the order of magnitude

Polylingual systems are based on recursive type systems

Page 6: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

6The Center for Advanced Research In Software

Engineering (ARISE)

Core Abstraction

Int n = R[“CEO”][“CTO”][“Geeks”]

CEO

CFO CTO

Test Geek

s

NameBonus

NameSalary

Geeks

CEO

CTO

Geeks

Page 7: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

7The Center for Advanced Research In Software

Engineering (ARISE)

Operations On Reification Operators

Copy Creates a copy of an element or attribute and adds it to its new location. All properties of an element or an attribute are cloned including all nested elements

Move It is identical to the copy operation except for the automatic removal of the original element or attribute upon completion of copying

Add It appends elements and attributes under a given path

Remove It removes elements and attributes from the given path. If a removed element contains nested elements then the entire branch of the graph under the removed element is deleted

Relational Compares graphs and their elements with constants, variables, or other graphs

Logic set Computes various logic set operations such as intersection, union, cartesian product, complement, and difference

Composition Composes two reification operators

Page 8: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

8The Center for Advanced Research In Software

Engineering (ARISE)

Our Solution: Reification Object-Oriented Framework (ROOF) Basic idea: each component in a polylingual system is

represented as a graph of objects and a uniform set of APIs is provided to navigate and manipulate these objects

We use the generality of graphs to develop a language and platform-independent solution for polylingual systems

Reification Object-Oriented Framework Reify objects from an FTS to the host language Remote objects become first-class objects Reification is based on reflection ROOF hides all the complexity that programmers have to deal

with today

Page 9: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

9The Center for Advanced Research In Software

Engineering (ARISE)

Birds-Eye View of the ROOF

CORBA .Net XML HTML DBMS

Reification Object-Oriented Framework (ROOF)

Foreign Object Reification Language (FOREL)

Page 10: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

10The Center for Advanced Research In Software

Engineering (ARISE)

…String s;s = R[“H2”][“B”][“FONT”];…

C++ Program

Reification Mechanism

<H2> <B> <FONT

size=“2"> Hello

World! </ FONT > </B></H2>

HTML Parser

Page 11: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

11The Center for Advanced Research In Software

Engineering (ARISE)

Reification Mechanism

RHTMLC++

<H2> <B> <FONT

size=“2"> Hello

World! </ FONT > </B></H2>

HTML Parser

from

…String s;s = R[“H2”][“B”][“FONT”];…

C++ Program

to

Page 12: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

12The Center for Advanced Research In Software

Engineering (ARISE)

Reification Mechanism

RHTMLC++

<H2> <B> <FONT

size=“2"> Hello

World! </ FONT > </B></H2>

HTML Parser

…String s;s = R[“H2”][“B”][“FONT”];…

C++ Program

Page 13: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

13The Center for Advanced Research In Software

Engineering (ARISE)

Reification Mechanism

RHTMLC++

<H2> <B> <FONT

size=“2"> Hello

World! </ FONT > </B></H2>

HTML Parser

…String s;s = R[“H2”][“B”][“FONT”];…

C++ Program

R

Page 14: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

14The Center for Advanced Research In Software

Engineering (ARISE)

Reification Mechanism

RHTMLC++

<H2> <B> <FONT

size=“2"> Hello

World! </ FONT > </B></H2>

HTML Parser

H2BFONT

Hello World! …

String s;s = R[“H2”][“B”][“FONT”];…

C++ Program

H2 B FONTS

Page 15: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

15The Center for Advanced Research In Software

Engineering (ARISE)

…String s;s = R[“JCls”][“GetString”];…

C# Program

Reification Mechanism

class JCls{ String GetString() { return( new String( “Hello World!”)); }}

Java Virtual Machine

Page 16: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

16The Center for Advanced Research In Software

Engineering (ARISE)

RJavaC#

class JCls{ String GetString() { return( new String( “Hello World!”)); }}

Java Virtual Machine

Reification Mechanism

from

…String s;s = R[“JCls”][“GetString”];…

C# Program

to

Page 17: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

17The Center for Advanced Research In Software

Engineering (ARISE)

Reification Mechanism

RJavaC#

class JCls{ String GetString() { return( new String( “Hello World!”)); }}

Java Virtual Machine

…String s;s = R[“JCls”][“GetString”];…

C# Program

Page 18: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

18The Center for Advanced Research In Software

Engineering (ARISE)

Reification Mechanism

RJavaC#

class JCls{ String GetString() { return( new String( “Hello World!”)); }}

Java Virtual Machine

…String s;s = R[“JCls”][“GetString”];…

C# Program

R

Page 19: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

19The Center for Advanced Research In Software

Engineering (ARISE)

Reification Mechanism

RJavaC#

class JCls{ String GetString() { return( new String( “Hello World!”)); }}

Java Virtual Machine

…String s;s = R[“JCls”][“GetString”];…

C# Program

JCls

GetString

Hello World!

JCls GetString

S

Page 20: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

20The Center for Advanced Research In Software

Engineering (ARISE)

Properties of the ROOF

Our solution does not introduce Additional type systems Hard-to-learn API Special constraints that affect programmer’s

decisions to share objects

ROOF allows programmers to Avoid using any naming mechanisms Type check foreign objects at compile time Other reasons

Page 21: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

21The Center for Advanced Research In Software

Engineering (ARISE)

FORTRESS

We exploit properties of FOREL-based code to recover high-level design of polylingual systems with a high degree of automation

Our solution is FOReign Types Reverse Engineering Semantic System (FORTRESS) Normalize code to conform to FOREL grammar Analyze FOREL-based code using program

analysis techniques (CFA and DFA) Infer schemas that describe FTS models and

operations executed against them

Page 22: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

22The Center for Advanced Research In Software

Engineering (ARISE)

GUIVisualization

Engine

FORTRESS Process

Normalizedcode

CompilerFront end

ProgramAnalysis

SchemaInference

Page 23: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

23The Center for Advanced Research In Software

Engineering (ARISE)

FTS RE Algorithm

1) Parse the source code and build an AST

2) Build a control flow graph

3) Build a data flow graph

4) For each branch in the control flow graph doa) Detect reachability of statements accessing and

manipulating reified types

b) Create schema definitions from reified types

c) Translate operations on reified type instances to operations of the schema definition elements

d) Output the schema and operations on its instances

5) End For

ProgramAnalysis

SchemaInference

OutputGeneration

Page 24: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

24The Center for Advanced Research In Software

Engineering (ARISE)

Schema Inference

SELECT u.Name, c.Course FROM User u, Courses c WHERE u.ID = c.ID; Two tables: User and Courses Attributes Name and ID in User table Attributes Course and ID in Course table Declaration of attribute ID in both tables is the

same or compatible

Page 25: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

25The Center for Advanced Research In Software

Engineering (ARISE)

Schema Inference

User

NameID

Courses

CourseID

Page 26: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

26The Center for Advanced Research In Software

Engineering (ARISE)

Schema Inference in FTSs

ReificationOperator R;float var = 100000.0;R[“CEO”][“CTO”](“Salary”) = var;

What can we infer from this statement? The structure of a branch of the data flow

Composite type CEO of some FTS Attribute Salary of type CTO The type of this attribute and a value that it is

assigned in this branch

Page 27: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

27The Center for Advanced Research In Software

Engineering (ARISE)

Schema Inference in FORTS

CEO

CTO Salary

R[“CEO”][“CTO”](“Salary”) = var;CEO CTO Salary

Page 28: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

28The Center for Advanced Research In Software

Engineering (ARISE)

Synergy

Program analysis and schema inference engine is a powerful combination Create the schemas that reflect the semistructured data

operated by the code Relate different FTSs by analyzing a single FTS program Create high-level design by relating actions to schemas

rather than variables and functions

I J

Q

Page 29: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

29The Center for Advanced Research In Software

Engineering (ARISE)

Output Generation

Outputs schemas describing FTSs instructions in readable format that manipulate

instances of schemas Visualization Tool

Presents a single high-level view of FTSs Models program execution and visualizes its

aspects

Page 30: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

30The Center for Advanced Research In Software

Engineering (ARISE)

FORTRESS Architecture

FORELcode

CompilerFront end

Control FlowAnalyzer

Data FlowAnalyzer

SchemaInference

Engine

Visu

aliza

tion

Driv

er

FORTRESS

Elapsed time: 2mins 27 sec

Navigate to node

GUIAST

Page 31: The Center for Advanced Research In Software Engineering (ARISE) The University of Texas at Austin Reengineering of Large-Scale Polylingual Systems Mark

The University of Texas at Austin

31The Center for Advanced Research In Software

Engineering (ARISE)

Conclusion

We show how the ROOF serves the underlying mechanism enabling the verification of large-scale polylingual systems Reduce the complexity from O(n2) to 1 Provide uniform API for graph navigation and manipulation with

precise semantics assigned to operations

Enable an effective reverse engineering process

Removes pain associated with understanding of legacy software

No existing solution addresses this problem