comprehending web applications by a clustering based approach

25
1 IWPC 2002, Paris, France Comprehending Web Applications by a Clustering Based Approach Anna Rita Fasolino G. A. Di Lucca, F. Pace, P. Tramontana, U. De Carlini Dipartimento di Informatica e Sistemistica University of Naples Federico II, Italy

Upload: porfirio-tramontana

Post on 24-May-2015

467 views

Category:

Technology


0 download

DESCRIPTION

The number and the complexity of web applications are increasing dramatically to satisfy the market requests, and the need of effective approaches for comprehending them is growing accordingly. Recently, some reverse engineering methods and tools have been proposed to support the comprehension of a web application; the information recovered by these tools is usually rendered in graphical representations. However, the graphical representations become progressively less useful with large-scale applications, and do not support adequately the comprehension of the application. In this paper, to overcome this limitation, we propose an approach based on a clustering method for decomposing a web application (WA) into groups of highly functionally related components. The approach is based on the definition of a coupling measure between interconnected components of the WA that takes into account both the typology and the topology of the connections. The coupling measure is exploited by a clustering algorithm that produces a hierarchy of clustering. This hierarchy allows a structured approach to the comprehension of the web application to be carried out. The approach has been experimented with on medium sized web applications and produced interesting and encouraging results.

TRANSCRIPT

Page 1: Comprehending Web Applications by a Clustering Based Approach

1IWPC 2002, Paris, France

Comprehending Web Applications by a Clustering Based Approach

Anna Rita Fasolino

G. A. Di Lucca, F. Pace, P. Tramontana, U. De Carlini

Dipartimento di Informatica e Sistemistica University of Naples Federico II, Italy

Page 2: Comprehending Web Applications by a Clustering Based Approach

2IWPC 2002, Paris, France

Web Applications (WA): problems and open issues

• The pressing market demand of web applications– WAs developed in very short time, with no respect of software

engineering principles

• The continuously changing needs of the evolving application domain– WAs frequently and rapidly modified with ad hoc approaches,

causing low quality software, disordered architecture and inadequate and incomplete documentation

• The growing complexity of WA technologies– From static web sites, to sites providing client-side interaction,

to web applications with dynamic content

Page 3: Comprehending Web Applications by a Clustering Based Approach

3IWPC 2002, Paris, France

Managing existing Web Applications

• Due to the large number of employed technologies, understanding, maintaining and evolving a dynamic application is a complex task …

• Reverse Engineering methods and techniques have been proposed for…– Analyzing the functional behavior of an existing WA– Reconstructing the architecture of the WA– Capturing and reusing the design of the application – Modeling static and dynamic views by UML diagrams (use

cases, sequence and class diagrams)– …

Page 4: Comprehending Web Applications by a Clustering Based Approach

4IWPC 2002, Paris, France

Current approaches for Web Application reverse engineering

• Based on graphical representations of the web application– Valuable approach for analyzing relatively small

applications but…– Less useful for coping with large scale applications.

• A possible solution:– Factoring the graphical representation into smaller cohesive

parts using Clustering techniques.

• An open issue:– Adapting traditional Clustering approaches to the WA area.

Page 5: Comprehending Web Applications by a Clustering Based Approach

5IWPC 2002, Paris, France

Applying traditional Clustering to the WAs

• Approaches based on file name analysis– Ineffective with applications whose source code has been

automatically generated or written without any coherent file name convention

• Approaches based on directory file analysis– Grounded on the hypothesis that the directory organization

mirrors the functional one, but …

• Pattern-driven clustering approaches– Requiring the identification of common structures in web

applications

Page 6: Comprehending Web Applications by a Clustering Based Approach

6IWPC 2002, Paris, France

Applying traditional Clustering to the WAs

• Approaches based on dependence, or dominance graphs– Requiring acyclic interconnection graphs! Unapplicable

because of backward links towards home/ index pages

• Approaches exploiting quality measures of a clustering– Optimal clustering is obtained by searching in a space of

possible graph partitioning. What kind of WA graph should be considered? And what quality measure?

Page 7: Comprehending Web Applications by a Clustering Based Approach

7IWPC 2002, Paris, France

A new method for clustering WAs

• Goal: Grouping software components of the WA into meaningful (highly cohesive) and independent (loosely coupled) clusters.

• Three questions have been addressed:– Definition of a model of a WA representing relevant

components and relationships.– Definition of a metric for expressing the degree of coupling

of interconnected components.– Selection of a clustering algorithm.

Page 8: Comprehending Web Applications by a Clustering Based Approach

8IWPC 2002, Paris, France

1. The conceptual model of a WA

Client Page with Frame

Client Module

Web Object

Web Page 0..n

0..n 0..n redirect

0..n 0..n

0..n

0..n

0..n Load in Frame

Client Page

0..n 0..n include

0..n 0..n

0..n

0..n

0..n

0..n

link

Server Page 0..1 0..1 0..1 0..1 build

0..n 0..n 0..n 0..n submit

0..n 0..n

0..n

include

0..n

• Components: Client pages, server pages, client page with frames, client modules, web objects.

• Relationships: Link, submit, redirect, build, load_in_frame, include.

• Each WA will be modeled by a WAG (WA Connection Graph).

Page 9: Comprehending Web Applications by a Clustering Based Approach

9IWPC 2002, Paris, France

2. A measure of coupling between WA components

• Heuristic approach:– Coupling between two nodes in the WAG will depend both

on typology and topology of the edges.– Two different weigthing strategies.

• Typology:– Different weigths will be assigned with build, link, redirect,

and submit edges.• Build: the greatest weight.• Redirect: greater weight than Link edges.• Submit : greater weight than Link and Redirect edges.

• wRL = wR / wL AND wSL = wS / wL AND 1< wRL < wSL.

Page 10: Comprehending Web Applications by a Clustering Based Approach

10IWPC 2002, Paris, France

A measure of coupling between WA components

• Topology: – The degree of coupling of two nodes A and B is considered

stronger when A uniquely reaches B, than when A reaches both B and other nodes.

– A new weighting strategy:• Edges from a node will be weighted (w OUT) according to the

fan-out of the node (greater the fan-out, less the weight).

• Edges towards a node will be weighted (w IN) according to the fan-in of the node (greater the fan-in, less the weight).

• The coupling measure

CA,B= CAB + CBA

Depending on weighted edgesfrom A to B

Depending on weighted edgesfrom B to A

Page 11: Comprehending Web Applications by a Clustering Based Approach

11IWPC 2002, Paris, France

3. The clustering algorithm …

• Agglomerative hierarchical clustering algorithm:– Iteratively gathers the graph nodes into new larger clusters,

starting from an initial clustering with each cluster including a single WA component.

INITIAL CLUSTERING

FINAL CLUSTERING

Page 12: Comprehending Web Applications by a Clustering Based Approach

12IWPC 2002, Paris, France

Four clustering rules

• At each iteration, a new clustering is obtained by applying four clustering rules

• Rule 1: the cluster containing a built client page will be merged with the cluster containing the server page building the former page;

• Rule 2: if and only if all the pages referenced by the <frame> tags of a client page with frame belong to the same cluster, the cluster including the page with frame will be merged with the cluster including the referenced pages;

• Rule 3: if and only if all the client pages (server pages) including a same client module (server page) belong to the same cluster, the cluster comprising the former pages will be merged with the cluster including the client module (server page);

• Rule 4: the pair of clusters whose coupling value is the maximum one will be gathered into a new cluster.

Page 13: Comprehending Web Applications by a Clustering Based Approach

13IWPC 2002, Paris, France

The clustering algorithm in PDL

1.     begin with n clusters each containing one WA component;

2.     define the wL, wRL and wSL values;3.     for each cluster containing a built client page component, apply rule R1;4.     while (there is at least a pair of connected clusters)do

for each cluster containing a client page with frame component, apply rule R2;for each cluster containing a client module or an included server page component, apply rule R3;for each cluster c, and for each x, compute wx

OUT (c) and wxIN (c);

for each pair of clusters A and B, compute the CA,B coupling between them;apply rule R4;

od  

Page 14: Comprehending Web Applications by a Clustering Based Approach

14IWPC 2002, Paris, France

How can the hierarchy of clustering be pruned?

• An approach based on a quality metric.

• A good clustering will include clusters with high intra-connectivity and low inter-connectivity.– intra-connectivity expresses the degree of cohesion between

items of a same cluster.• a weighted mean of a cluster inner edges (values in [0, 1])

– inter-connectivity expresses the degree of coupling between items of two different clusters.

• a weighted mean of edges between clusters (values in [0, 1])

• The Quality of a Clustering metric QoC : QoC= IntraConnectivity – InterConnectivity (values in [-1, 1])

Page 15: Comprehending Web Applications by a Clustering Based Approach

15IWPC 2002, Paris, France

The choice of the hierarchy cut-heigth

• The QoC determines the quality of a clustering as the trade-off between inter-connectivity and intra-connectivity.– It rewards the creation of highly cohesive clusters and

avoids excessive coupling between clusters.

• The clustering exihibiting the maximum QoC is a candidate to implement the best partition of the WA components.

Page 16: Comprehending Web Applications by a Clustering Based Approach

16IWPC 2002, Paris, France

Using clustering during WA comprehension processes

• A structured approach:– Static analysis of the WA and production of the WA Connection

Graph;– Execute clustering;– Find the Cmax clustering with the maximum QoC value;

– Submit the Cmax clustering to a Concept Assignment Process (CAP).

• An integrated tool platform supporting the process:– The Reverse engineering WARE tool for :

• Executing Static Analysis of the WA and producing the WAG• Automatic clustering and Search for the best clustering• Supporting the software engineer during the CAP

Page 17: Comprehending Web Applications by a Clustering Based Approach

17IWPC 2002, Paris, France

A validation experiment

• Goal of the experiment:– Assessing the effectiveness of the clustering approach in

supporting WA comprehension.

• Experimental procedure:– Several WAs were analyzed with the clustering technique.– Software engineers (unfamiliar with the WAs) carried out the CAP,

and distinguished Valid from Invalid clusters.– Valid clusters: whose items actually implemented one function.– Invalid clusters were classified as spurious, divisible, or

incomplete: • Spurious (whose items show low cohesion degree)• Divisible (whose items can be splitted into smaller cohesive clusters)• Incomplete (not including all necessary items implementing a function)

Page 18: Comprehending Web Applications by a Clustering Based Approach

18IWPC 2002, Paris, France

A case study

• A WA for managing undergraduate course activities:

– Providing course information, allowing student registration to

the course, or exam sessions, teaching material download or upload, managing the teacher course agenda.

– Implemented using HTML, ASP, Vbscript, Javascript technologies, with MS Access database.

– Composed of 107 source files arranged in one directory (size of about 500 Kbytes).

– Development documentation included UML use case diagrams and textual description of the WA functions.

Page 19: Comprehending Web Applications by a Clustering Based Approach

19IWPC 2002, Paris, France

Results from the Static analysis of the WA

Component type # Detected

Server page 76

Client Static page 23

Client Built page 75

Submit Operation 49

Anchor (Hypertextual link) 45

Redirect operation 8

Include operation 57

Load in Frame operation 4

• The inventory produced by the WARE tool:

Page 20: Comprehending Web Applications by a Clustering Based Approach

20IWPC 2002, Paris, France

The WA Connection Graph

174 nodes

Page 21: Comprehending Web Applications by a Clustering Based Approach

21IWPC 2002, Paris, France

The WA clustering proposed by the tool

50 cluster nodes

The clustering exihibiting the maximum QoC

Page 22: Comprehending Web Applications by a Clustering Based Approach

22IWPC 2002, Paris, France

Results from the CAP

• The source code of the proposed clusters was analyzed in order to associate each cluster with a description of the implemented function: – 44 valid clusters

– 6 incomplete clusters

– 3 pairs of clusters could be gathered together into three new clusters

• Final result: 47 valid clusters • Cluster descriptions were compared against the development

documentation:– Each valid cluster matched with a use case!

• Effectiveness= # Valid Clusters/ #Proposed clusters= 94%

Page 23: Comprehending Web Applications by a Clustering Based Approach

23IWPC 2002, Paris, France

Lesson learned

• The problem of the cut-heigth with hierarchical clustering:– The QoC metric suggests a candidate clustering to be analyzed. For a

given QoC, the maximum coupling value C° represents the cut-heigth.

• In order to improve the effectiveness of the approach, further clustering from the hierarchy can be taken into account.– The clustering with a cut-heigth greater than C° is likely to include

smaller clusters.

– The clustering with a cut-heigth less than C° is likely to include larger clusters.

• A heuristic approach:– Use a cut-heigth greater than C° if the considered clustering massively

includes spurious clusters.

– Use a cut-heigth less than C° if the considered clustering massively includes incomplete clusters.

Page 24: Comprehending Web Applications by a Clustering Based Approach

24IWPC 2002, Paris, France

Conclusions

• The inceasing diffusion and increasing complexity of WAs oblige reseachers to seach for effective Reverse engineering techniques involving WAs.

• Clustering approaches can be used to collapse the size of a WA and carry out comprehension processes more effectively.

• A clustering approach exploiting a coupling measure of WA components that considers both the typology and the topology of connections has been proposed and preliminarly validated.

Page 25: Comprehending Web Applications by a Clustering Based Approach

25IWPC 2002, Paris, France

Future work

• A finer model of dependencies between WA components will be investigated.

• The data flow between components will contribute to the evaluation of the coupling.

• Experimenting the clustering approach in the context of WA remodularization and reengineering.