a measurement-based approach for detecting design problems ... · abstract refactoring is a...

A Measurement-Based Approach forDetecting Design Problems in

Object-Oriented Systems

M. J. Munro

Technical ReportEFoCS-57-2005

Department of Computer and Information SciencesUniversity of Strathclyde

July 2004

mailto:[email protected]

Abstract

Refactoring is a reengineering process used to improve the de-sign of a system by the application of a number of well-definedcode level transformations. A major recognised problem of refac-toring is the identification of the locations at which these trans-formation should be applied, otherwise known as the detection of“bad smells”. Instead of relying on human intuition, Marinescu hasproposed and evaluated a set of metrics for automatically detectinga number of these design flaws. This paper empirically evaluatesthe metrics for detecting data classes and god classes on two dif-ferent systems - a simple hotel booking system, a design patternrefactoring tool and the public domain unit-testing tool JUint. Theresults raise interesting questions regarding the accuracy of thesemetrics and their consistency when applied to a range of systems.The reasons for these differences are highlighted and suggestionsare made to improve the robustness of these metrics.

CONTENTS CONTENTS

Contents

1 Introduction 1

2 Refactoring 12.1 Design Heuristics . . . . . . . . . . . . . . . . . . 12.2 Design Flaws . . . . . . . . . . . . . . . . . . . . 32.3 Bad Smells . . . . . . . . . . . . . . . . . . . . . 32.4 AntiPatterns . . . . . . . . . . . . . . . . . . . . 32.5 Summary . . . . . . . . . . . . . . . . . . . . . . 4

3 Locating Design Problems 43.1 Proposed Solution by Marinescu . . . . . . . . 53.2 Focused Solution . . . . . . . . . . . . . . . . . . 63.3 Hypothesis . . . . . . . . . . . . . . . . . . . . . . 6

3.3.1 Design Problem’s Characteristics . . . . 73.3.2 O-O Software Metrics . . . . . . . . . . . 73.3.3 Metric Interpretation . . . . . . . . . . . 8

4 Design of Experiment 94.1 Two Design Problems . . . . . . . . . . . . . . . 10

4.1.1 Data-Class . . . . . . . . . . . . . . . . . 104.1.2 God Class . . . . . . . . . . . . . . . . . . 10

4.2 Characteristics . . . . . . . . . . . . . . . . . . . 114.2.1 Data-Class . . . . . . . . . . . . . . . . . 114.2.2 God-Class . . . . . . . . . . . . . . . . . . 11

4.3 Choice of Metric sets . . . . . . . . . . . . . . . 114.3.1 Data-Class . . . . . . . . . . . . . . . . . 124.3.2 God-Class . . . . . . . . . . . . . . . . . . 12

4.4 Filtering Mechanism . . . . . . . . . . . . . . . . 144.4.1 Data-class . . . . . . . . . . . . . . . . . . 144.4.2 god class . . . . . . . . . . . . . . . . . . 16

5 Data Collection 175.1 Software Systems . . . . . . . . . . . . . . . . . . 175.2 Metric Tool . . . . . . . . . . . . . . . . . . . . . 18

5.2.1 Metric Framework . . . . . . . . . . . . . 195.2.2 Metric Definitions . . . . . . . . . . . . . 20

III

CONTENTS CONTENTS

6 Manually Apply Metrics 206.1 Data-Class . . . . . . . . . . . . . . . . . . . . . . 21

6.1.1 HotelSystem . . . . . . . . . . . . . . . . 216.1.2 myrfctr . . . . . . . . . . . . . . . . . . . 236.1.3 JUnit . . . . . . . . . . . . . . . . . . . . 24

6.2 God-Class . . . . . . . . . . . . . . . . . . . . . . 246.2.1 HotelSystem . . . . . . . . . . . . . . . . 246.2.2 myrftr . . . . . . . . . . . . . . . . . . . . 266.2.3 JUnit . . . . . . . . . . . . . . . . . . . . 27

7 Refinements of Applying Metrics Manually 277.1 Data Class . . . . . . . . . . . . . . . . . . . . . . 297.2 God-Class . . . . . . . . . . . . . . . . . . . . . . 307.3 Further Issues with Automatic Detection . . . 34

8 Tool Application 348.1 Data-Class . . . . . . . . . . . . . . . . . . . . . . 358.2 God-Class . . . . . . . . . . . . . . . . . . . . . . 358.3 TypEx . . . . . . . . . . . . . . . . . . . . . . . . 36

8.3.1 Data-Class . . . . . . . . . . . . . . . . . 378.3.2 God-Class . . . . . . . . . . . . . . . . . . 40

9 Conclusions 419.1 In Addition . . . . . . . . . . . . . . . . . . . . . 419.2 Current State of the Art . . . . . . . . . . . . . 439.3 Future work . . . . . . . . . . . . . . . . . . . . . 45

Bibilography 46

A Definitions of Metrics Implemented into Eclipse Met-ric Framework 52A.1 Class Level . . . . . . . . . . . . . . . . . . . . . 52

A.1.1 Coupling Between Objects (CBO) . . . 52A.1.2 Coupling Between Data Classes 1 (CBDC1) 53A.1.3 Coupling Between Data Classes 2 (CBDC2) 53A.1.4 Commented Lines Of Code (CLOC) . . 53A.1.5 Non-Commented Lines Of Code (NCLOC) 53A.1.6 Comment Density (CD) . . . . . . . . . 53

IV

CONTENTS CONTENTS

A.1.7 Depth of Inheritance Tree (DIT) . . . . 53A.1.8 Instance Variable Method Count (IVMC) 54A.1.9 Lines Of Code (LOC) . . . . . . . . . . . 54A.1.10 Number Of Accessor Methods (NOAM) 54A.1.11 Number Of Classes (NOC) . . . . . . . 54A.1.12 Number Of Class Constructors (NOCC) 54A.1.13 Number Of External Methods With Pa-

rameter List the Same as Instance Vari-able Types (NOEMWPLSIVT) . . . . . 54

A.1.14 Number Of Internal Methods With Pa-rameter List the Same as Instance Vari-able Types (NOIMWPLSIVT) . . . . . 55

A.1.15 Number Of Instance Variables (NOIV) 55A.1.16 Number Of Methods (NOM) . . . . . . 55A.1.17 Number of Methods Added (NMA) . . 55A.1.18 Number of Methods Extending (NME) 55A.1.19 Number of Methods Overriding (NMO) 55A.1.20 Tight Class Cohesion (TCC) . . . . . . 56A.1.21 Weight Of Class (WOC) . . . . . . . . . 56A.1.22 Weight Of Class - 2 (WOC2) . . . . . . 57A.1.23 Weighted Method Count (WMC) . . . 57

A.2 Method Level . . . . . . . . . . . . . . . . . . . . 58A.2.1 Accessor Method (ACCM) . . . . . . . 58A.2.2 Commented Lines Of Code in a Method

(CLOCM) . . . . . . . . . . . . . . . . . . 60A.2.3 Non-Commented Lines Of Code in a

Method (NCLOCM) . . . . . . . . . . . 60A.2.4 Comment Density - Method level (CDm) 60A.2.5 Lines Of Code in a Method (LOCM) . 60A.2.6 McCabe Cyclomatic Complexity (V(G)) 61A.2.7 Number Of Cases within Switch State-

ments (NOCSS) . . . . . . . . . . . . . . 61A.2.8 Number Of Parameters (NOP) . . . . . 61A.2.9 Number Of Parameters Not Referred

(NOPNR) . . . . . . . . . . . . . . . . . . 61A.2.10 Number Of Parameters Used (NOPU) 61A.2.11 Number Of Switch Statements (NOSS) 61

V

CONTENTS CONTENTS

B Analysis with Developer of TypEx for a Data-Class 62B.1 False Positives . . . . . . . . . . . . . . . . . . . 65

C Analysis with Developer of TypEx for a God-Class 66C.1 False-positives . . . . . . . . . . . . . . . . . . . . 69

D Glossary 70D.1 Design Flaw . . . . . . . . . . . . . . . . . . . . . 70D.2 Design Heuristic . . . . . . . . . . . . . . . . . . 70D.3 Design Principle . . . . . . . . . . . . . . . . . . 70D.4 Design Rule . . . . . . . . . . . . . . . . . . . . . 70D.5 Problem Detection . . . . . . . . . . . . . . . . . 70

VI

LIST OF FIGURES LIST OF FIGURES

List of Figures

1 How design heuristics are used. . . . . . . . . . 22 The Process to Identify Metrics for Automatic

Detection of a Design Problem. . . . . . . . . . 103 The Process to Extract Metric Measurements

on Java Source Code. . . . . . . . . . . . . . . . 204 BillItem.java. . . . . . . . . . . . . . . . . . . . 225 Accessor Method Pseudo-code. . . . . . . . . . 59

VII

LIST OF TABLES LIST OF TABLES

List of Tables

1 God class (behavioural form), related heuris-tics. [Rie96] . . . . . . . . . . . . . . . . . . . . . 4

2 Data Class Metrics . . . . . . . . . . . . . . . . 123 God Class Metrics . . . . . . . . . . . . . . . . . 134 Data Class Filters . . . . . . . . . . . . . . . . . 155 God Class Filters . . . . . . . . . . . . . . . . . 166 Data Class, hotelSystem results . . . . . . . . . 227 Data Class, myrfctr results . . . . . . . . . . . 248 Data Class, JUnit Results . . . . . . . . . . . . 259 God Class, hotelSystem results . . . . . . . . . 2610 God Class, myrftr results . . . . . . . . . . . . 2711 God Class, JUnit results . . . . . . . . . . . . . 2812 Data Class Refined metrics and rules . . . . . 3013 Data Class, hotelSystem Refinement . . . . . . 3114 God Class Refinements of metrics and rules . 3215 God Class, myrfctr refinements applied . . . . 3316 Data-Class, TypEx. . . . . . . . . . . . . . . . . 3917 God-Class, TypEx. . . . . . . . . . . . . . . . . . 40

VIII

2 REFACTORING

1 Introduction

The dominant software engineering process for developingsoftware is maintenance. The structure of source code canbe an attribute that significantly effects this level of mainte-nance, as developers spend more time trying to understandthe design structure of a system before identifying a solu-tion to integrate new code [BL71]. The recently proposedtechnique of refactoring presents a potential solution to thisproblem.

2 Refactoring

Refactoring changes the internal code structure of an Object-Oriented (O-O) system without affecting the overall behaviourof the system to improve the quality of the design [Fow99].Refactoring has a role to play in reverse and re-engineeringsystems by making semantic-preserving transformations ofcode into a form that the software engineer finds easier tounderstand. Fowler describes 72 well-defined refactorings(core-level transformations), which have three distinct stagesto their application: identify a problem where to apply arefactoring, choose an appropriate refactoring as a solutionand apply the refactoring.

2.1 Design Heuristics

The first stage of refactoring relates to identifying problemsin the design and using O-O design heuristics to help identifywhere to apply a refactoring. A design heuristic is a rule-of-thumb that tries to capture the experience of how developersidentify design problems. Heuristics are not “hard-and-fast”rules and should be used as guidance for developers to im-prove their design problems to produce a better overall de-signed system.

Dijkstra [Dij68] identified one of the earliest heuristics ina short letter to the editor of a Communications of the ACMpublication entitled “Go to statements considered Harmful”.

1

2.1 Design Heuristics 2 REFACTORING

Figure 1: How design heuristics are used.

AntiPatterns

Design Heuristic

....1

1..*

Bad SmellsDesign Flaws

The problem is in the domain of procedural languages wherethe whole program is within a single file and “GOTO” state-ments that allow jumps to anywhere throughout the pro-gram. GOTO statements are useful in particular situations,for example to jump out of complex loop structures, howeverthey are more problematic than useful. A system becomesdifficult to follow and understand when GOTO statementsare present, breaking up the logic through the consequenceof jumping to different locations in the program. Dijkstradescribes “GOTO statements as it stands is just too prim-itive; it is too much an invitation to make a mess of one’sprogram”.

A number of O-O design heuristics exist in the literaturethat try to capture the essence of good design such as, “O-ODesign Heuristics” [Rie96] and the “Law of Demeter” [LH89].A design heuristic is a suggestion of how to improve thedesign of a software system. Applying all possible designheuristics may inadvertently reduce the quality of a softwaredesign. There exists specific design problems in the liter-ature; “Design Flaw” [Mar02], “Bad Smells” [Fow99], and“AntiPatterns” [BMMM98]. Each of these design problemscan be manifested into a number of design heuristics thatcorrespond to the characteristics of the initial problem.

Figure 1 shows where design heuristics fit within designproblems from the literature. Design flaws, Bad smells and

2

2.2 Design Flaws 2 REFACTORING

AntiPatterns all describe a design problem where a numberof design heuristics could be identified to capture the char-acteristics of the problem. Figure 1 only identifies; designflaw, bad smell and AntiPatterns, design problems as theyare the main focus throughout the report, there are othersin the literature.

Each of the design problems identified in Figure 1 are de-scribed in more detail to emphasise the main differences be-tween them.

2.2 Design Flaws

Marinescu defines design flaws as containing the structuralcharacteristic of a design entity or design fragment that ex-presses a deviation from a given set of criteria typifying thehigh quality of a design [Mar02].

2.3 Bad Smells

Fowler defines 22 colloquially named bad smells [Fow99], whereeach design problem has a number of related refactorings thatcan be applied to a system to solve the design problem.

2.4 AntiPatterns

An AntiPattern is a literary form that describes a commonlyoccurring solution to a problem that generates decidedlynegative consequences [BMMM98]. Each AntiPattern hasa template outline comprising of eight main parts, that helpsunderstand and solve the particular problem. These partsbreak down why the original problem exists, identifies themain characteristics of the problem, how the problem can besolved using refactoring steps, and a demonstrated examplehow the solution can be applied.

An AntiPattern identifies common problems software en-gineers introduce when producing software, also inexperienceapplying a Gang Of Four’s (GOF) design pattern [GHJV94]in an appropriate context.

3

2.5 Summary 3 LOCATING DESIGN PROBLEMS

God Class - Behavioural FormHeuristic Description

1.1 Distribute systems intelligence horizontally as uniformly as possi-ble, that is, the top-level classes in a design should share the workuniformly.

1.2 Do not creaite god classes/ objects in your system. Be very suspi-cious of a class whose name contains Driver, Manager, System,or Subsystem.

1.3 Beware of classes that have many accessor methods defined in theirpublic interface. Having many implies that related data and be-haviour are not being kept in one place.

1.4 Beware of classes that have too much non communicating behaviour,that is, methods that operate on a proper subset of the data mem-bers of a class. God classes often exhibit much non communicatingbehaviour.

Table 1: God class (behavioural form), related heuristics. [Rie96]

2.5 Summary

The god class problem described by Riel [Rie96], is an ex-ample that runs throughout the report to show the proposedapplication how to automatically locate within a Java sys-tem. Here the god class problem is used to highlight the dif-ferences in terminologies identified in Figure 1. A god classin the behavioural form is described by Riel [Rie96] infor-mally in words as, “where developers attempt to capture thecentral control mechanism so prevalent in the action-orientedparadigm with their O-O design”. In addition Riel identifiesfour O-O design heuristics, that capture the characteristicsof the problem, shown in Table 1.

Detecting any design problem in a system is not trivial -experience is a key factor in knowing what a problem arealooks like. Even with the benefit of experience, identifyingthem in a large system by hand is an overwhelming task.

3 Locating Design Problems

Currently to locate a design problem, requires manual inspec-tion of the source-code, which quickly becomes unfeasible asthe size of the system increases. Other considerations identi-

4

3.1 Proposed Solution by Marinescu 3 LOCATING DESIGN PROBLEMS

fied by Bar [BC98] are that a system may be developed by dif-ferent developers or teams, where design problems can rangeover several subsystems and thus cannot be detected locally.Also, developers may not know exactly what to search for,even when using a set of heuristics. It can be considered tobe inappropriate to manually inspect large systems to locatedesign heuristics.

Clearly, automatic identification of such problems is anappealing prospect. The problem this research is aiming toaddress is how to effectively support and guide the softwareengineer in identifying design heuristics in O-O Java systemsof significant sizes.

3.1 Proposed Solution by Marinescu

A solution to automate the detection of design problems insource-code to aid developers identifying them in an O-O sys-tem has been described by Marinescu [Mar02]. Marinescurefers to design problems as design flaws. Marinescu definesa process to search O-O source-code to look for possible de-sign problems, known as a detection strategy. A detectionstrategy has a number of metrics, filters and a compositionmechanism that best matches a design problem characteris-tics.

Marinescu also defines detection strategies that relate to aspecific design flaw. A detection strategy has four sequencesof steps; analysis of the problem, selection of metrics, detec-tion of candidates and examination of candidates. An identi-fied problem taken from the literature is analysed to quantifythe informal description. Using the quantitative descriptiona selection of metrics are chosen that best match the prob-lem’s characteristics, here is where the detection strategy isexpressed using the identified metrics. Detection of candi-dates measures systems using the defined detection strategy,using the chosen metrics. The last stage examines the resultsthat the detection strategy identified using the proposed, andwhether refinements are required [Mar02].

Marinescu’s research classified a number of design-flaws

5

3.2 Focused Solution 3 LOCATING DESIGN PROBLEMS

“according to the granularity level of the design entity af-fected by each design-flaw” [Mar02]. Each classification leveldescribes the problem level and identifies a number of designflaws that are related to this level.

It is also pointed out by Marinescu that the design prob-lems are hard to model using a detection strategy, “as thereare situations where a code fragment might be consideredflawed in one case while in another case, a similar, mostlyidentical design fragment is justifiable and may not be con-sidered a design problem” [Mar02].

3.2 Focused Solution

This work extends Marinescu’s study with the main focuson the design problems identified by Fowler known as badsmells. The reason to focus on bad smell problems is thatthey are connected to a set of refactoring methods that whenapplied to source code can help to solve the design problem.

In addition the selected metrics identified in a detectionstrategy by Marinescu, are not justified with the design prob-lem characteristics. Understanding the reasoning behind thechosen set of metrics within a design strategy will help ap-preciate the design problem and identify if the metrics arebest suited for the problem.

To achieve these goals, appreciation and understanding ofthe design strategies Marinescu has described for 5 of thebad smell problems would be the first stage. A repeat ofMarinescu’s study with the same set of metrics and filteringmechanism, will be used to see whether the same results areachieved. In addition each class in the system under inves-tigation will be manually inspected to decide, which designproblems actually exist, as to aid locating any false-positives.

3.3 Hypothesis

If identifying the characteristics of a bad smell can be relatedto a set of software metrics, then by using a pre-defined set ofthresholds to interpret the software metrics results applied

6

3.3 Hypothesis 3 LOCATING DESIGN PROBLEMS

to Java source-code, the software engineer can be providedwith significant guidance as to the location of the bad smell.

3.3.1 Design Problem’s Characteristics

All of the design problems identified in the literature [Fow99,Rie96, BMMM98] are presented in an essay style that de-scribes the problem informally. This makes it difficult toidentify them automatically. Originally these guidelines weremeant to be followed by a human developer when creating anew or analysing an existing design, rather than for an au-tomatic tool to detect violations of design rules in a givensystem design [BC98]. A more formal definition of each de-sign problem is required so the important characteristics canbe related to attributes which can be measured.

The unique characteristics corresponding to the informaldescription, and where possible design heuristics relating toa design problem are identified and then matched against ameasurement technique that can be used to help automat-ically identify the problem in source code. There will notnecessarily be a single technique that captures all the char-acteristics of a design problem.

For example, duplicated code is one of Fowlers bad smells,which is described as “seeing the same code structure in morethan one place” [Fow99]. This description lacks detailed re-quirements to fully identify it in source code, and does not in-dicate for example what is meant by “seeing the same”. Doesit mean identical or the same structure with name change?However the fields of clone detection and reduction focus[BMD+00] on such issues and can help towards identifyingspecific characteristics.

Marinescu’s detection strategies consist purely of softwaremetrics and his study starts from this work, so his set ofmeasurement techniques will be considered first.

3.3.2 O-O Software Metrics

Software metrics is a collective term used to described thevery wide range of activities concerned with measurement

7

3.3 Hypothesis 3 LOCATING DESIGN PROBLEMS

in software engineering [FN00]. For example a simple met-ric considers counting the number of Lines Of Code (LOC)that exists in a system. Thus matching the specific attributeof a metric to a characteristic of a design heuristic can helptowards building a model to guide the possible location ofdesign within source-code. There may not be a perfect char-acteristic to metric match, so a number of metrics may berequired to fulfil the criteria. In addition metrics are onlymeasurements that require to be interpreted as to indicateproblem areas. Threshold bounds can be placed upon metricresults but require some justification on the reasoning behindchoosing such levels.

A number of O-O software metrics are defined in the lit-erature, which relate to specific attributes of a software’sdesign. Chidamber and Kemerer made one of the first defini-tions of O-O metrics, defining six metrics that cover the mostfundamental design parts of a system: cohesion, complexity,coupling, depth of inheritance, number of class siblings andthe response for a class [CK94]. Bieman and Kang proposeda refined definition for a class’s cohesion, that considers therelative number of methods connected by instance variables[BK95].

3.3.3 Metric Interpretation

An important task when choosing a metric is to interpret theresults by identifying what they are attempting to measure.Metric thresholds are bounds aimed at capturing acceptableand potentially problematic values for a metric. Identifyingmetric thresholds is difficult and may require a number ofrefinements.

Marinescu uses a filtering mechanism to reduce the initialdata set [Mar02]. The end result of this approach produces3 concrete types of data filters: absolute, relative and statis-tical. An absolute filter has an upper and lower value takenfrom the literature that corresponds to a design problem.Relative filters are when characteristics of a design problemare not precise enough to place an absolute filter, for example

8

4 DESIGN OF EXPERIMENT

“methods of high complexity should be split” [Mar02] couldconsider the upper 10% percentile range of a metric results.The last type of filter, statistical use box-plots on the metrisresults as to locate outliers for potential concerns.

Benlarbi et al. [BEEGR00] carried out a study to identifywhether a “practical application of O-O measures can predictwhich classes in a system contain a fault”. Through using acognitive theory which suggests there are threshold effectsfor many O-O measures, by identifying that “O-O classesare easy to understand as long as their complexity is belowa threshold, where as above this threshold understandabil-ity decreases rapidly” [BEEGR00]. The study is empiricallytested on 2 c++ systems which focuses on a subset of theChidamber and Kemerer metrics suite [CK94]. A numberof thresholds corresponding to these measurements are usedfrom the literature. The results indicate that there are nothreshold effects for any of the measures studied.

A starting point for this study is to use the same the filter-ing mechanisms identified by Marinescu for each design prob-lem. However thresholds are not absolute and it is necessaryto investigate how they may vary from system to system.

4 Design of Experiment

Marinescu’s study is repeated using two design problems,data-class and god class. The metrics and filtering mecha-nisms identified by Marinescu that correspond to these twoproblems are applied to a number of Java systems. Figure2 summarises the process of automatically identifying a de-sign problem in a system using metrics. A design problemhas a number of vital characteristics taken from the informaldescription in the literature and any possible correspond-ing design heuristics. Each characteristic is matched againsta metric or a number of metrics that encapsulates it best.At present the selection of metrics for a chosen character-istic are taken straight from Marinescu’s work. Respectivethresholds, to help interpret the measurement results linkeach metric. The more bounds that are placed on a particu-

9

4.1 Two Design Problems 4 DESIGN OF EXPERIMENT

1..*1 1..* 1..*Characteristics

0..*1..0Design Flaw Metrics Bounds

Figure 2: The Process to Identify Metrics for Automatic Detection of a DesignProblem.

lar metric may identify the metric that is an incorrect choicefor the corresponding characteristic.

4.1 Two Design Problems

This report focuses on the data-class and god class problemsas these are often indicators of fundamental weaknesses in anO-O design.

4.1.1 Data-Class

Fowler describes data-classes as “classes that have fields, get-ting and setting methods for fields, and nothing else” [Fow99].The purpose of these types of classes is purely to hold data,where other classes manage this data. This is a class that hasno functionality, containing only instance variables, construc-tor methods and accessor methods that change or return thewhole state of an instance variable. Other types of accessormethods exist, that returns or changes a partial state of aninstance variable, but will not be initially considered as theystart to deviate away from the initial problem. In additioninstance variables can be changed directly if their access mod-ifier is of type “public”, without using a designated methodto do so.

4.1.2 God Class

A god class is a class that “captures the centralised con-trol of an O-O system ... leaving minor details to a col-lection of other classes” [Rie96], that typically has a tradi-tional procedural (or “action oriented”) structure in an O-Osystem. Brown et al. identified that the fundamental prob-lem is that functionality is not distributed evenly amongst

10

4.2 Characteristics 4 DESIGN OF EXPERIMENT

classes [BMMM98]. The design of this type of class canbe considered to have a data driven design (DDD) [SP00],where classes are created first to store information, then theresponsibilities are considered and shared between classes.The overall design produces classes that are domineering, asthe system’s responsibilities are not equally shared over allclasses.

This artificial separation of data from its associated be-haviour is a violation of one of the tenets of the O-O designphilosophy. Riel’s heuristic for identifying Data Classes isbased on the number of accessor methods in the class inter-face - “Beware of classes that have many accessor methodsdefined in their public interface” [Rie96].

4.2 Characteristics

The main characteristics for a data-class and a god class aretaken from the literature descriptions above, are mentionedbelow.

4.2.1 Data-Class

These are classes with limited functionality, which containmethods to change the state of a class through accessor-methods. A class that contains a combination of instancevariables, constructor and accessor-methods and nothing elsecan be considered to be a data-class.

4.2.2 God-Class

A god class is a large class that implements a number of thesystems functionality and manipulates data from primitiveinstance variables, or from objects that are instance variablesthat can be considered to be a data-class or are used like oneto store information.

4.3 Choice of Metric sets

To help identify metrics, which best match the characteristicsof a data-class and god class, the set of metrics Marinescu’s

11

4.3 Choice of Metric sets 4 DESIGN OF EXPERIMENT

Name Description

WOC (Weight of Class) Number of non-accessor methods in aclass divided by the total number ofmembers of the interface. [Mar01]

NOPA (Number of Public Attributes) The number of non-inherited attributesthat belong to the interface of a class.[Mar01]

NOAM (Number of Accessor Methods) The number of non-inherited accessormethods declared in the interface of aclass. [Mar01]

Table 2: Data Class Metrics

identified are used [Mar02] and described in the next subsec-tions.

4.3.1 Data-Class

Marinescu’s detection strategy for data-class uses three met-rics, described in Table 2, Weight Of Class (WOC), Num-ber Of Public Attributes (NOPA) and Number Of AccessorMethods (NOAM) that try to encapsulate the characteristicsof a data-class. Both NOPA and NOAM metrics relate howthe state of a class can be changed, either directly throughinstance variables declared as having an access type public orindirectly using accessor methods. The other metric WOC,identifies the number of accessor methods divided by the to-tal number of methods defined in a class, where smaller val-ues implies more accessor methods are present in the class.

4.3.2 God-Class

Marinescu identified three metrics in the detection strategyfor a god class, that aim to capture the characteristics, theseare described in Table 3. In general, to interpret these char-acteristics using measurements the size, complexity, cohesionand the number of data-classes a class is coupled with can beconsidered.

The complexity of a class can be difficult to measure. Forexample an algorithm may have complex behaviour but may

12

4.3 Choice of Metric sets 4 DESIGN OF EXPERIMENT

Name Description

ATFD (Access To Foreign Data) The number of external classes fromwhich a given class accesses attributes,directly or via accessor methods. In-ner classes and superclasses are notcounted. [Mar01]

WMC (Weighted Method Count) The sum of the static complexity of allmethods in a class. [CK94]

TCC (Tight Class Cohesion) The relative number of directly con-nected methods. Where two methodsare connected if they access a commoninstance variable. [BK95]

Table 3: God Class Metrics

be well documented and hence easy to understand. In com-parison a recursive method can be difficult to understandwithout a full example. Knowing how to distinguish betweenthese kinds of complexity is difficult. A widely used metricthat measures the complexity of any method implemented ina system is McCabe’s cyclomatic complexity metric, whichcounts the number of condition statements [McC76].

Chidamber and Kemerer [CK94] extended McCabe’s com-plexity measure to incorporate the O-O programming lan-guage paradigm. Weighted Method Count (WMC), sumsMcCabe’s complexity for each method implemented in a class.

A class that is designed well is one where its membersintegrate successfully. This kind of software quality can bemeasured by considering a cohesion attribute. A class strivesto be highly cohesive, meaning it is difficult to split its func-tionality. Most cohesion metrics are based on either instancevariables usage or sharing of instance variables through methodintegration [BB03]. Bieman and Kang define a cohesion met-ric for a class, Tight Class Cohesion (TCC) that measures therelative number of directly connected methods, where meth-ods are considered to be connected when they use at leastone common instance variable [BK95].

There are three levels which TCC can be calculated, herewe have only considered the cohesion of the current classesmethods and instance variables. Other calculations consider

13

4.4 Filtering Mechanism 4 DESIGN OF EXPERIMENT

inherited methods and instance variables. The lower a TCCmetric value for a class, can be interpreted as not being wellformed and encapsulate a single responsibility. A character-istic of a god class, is where a class can take more than theaverage proportion of a systems responsibility, which shouldbe spread over a number of classes. Hence, a low TCC metriccould be interpreted to be a god class.

There is one other metric Marinescu identifies to help lo-cating a god class, Access To Foreign Data (ATFD), whichis defined as “The number of external classes from which agiven class accesses attributes, directly or via access-methods.Inner and super-classes are not included” [Mar01]. It is dif-ficult to obtain an accurate result for the definition of ATFDas in general it is impossible to statically know which class amethod is implemented in and it may be overridden. How-ever the systems to be analysed first are small making itpossible to obtain a calculate this measurement.

4.4 Filtering Mechanism

Marinescu defines filtering mechanisms to interpret metricresults relating to a detection strategies design problem. Thereare 3 types of data filters, either measure an absolute, rela-tive or statistical bound on the results. An absolute measure-ment, such as identifying the top 10 classes of a metric result(e.g. D1), is very limiting as its value is easily distorted whenapplied to small systems. A relative measurement, such asthe top 10%, takes into consideration the varying sizes ofsystems. The last type use statistical box-plots to identifyoutliers within the results. The main concern with theserules is knowing the appropriate bound - only experience orempirical evaluation can help with this definition.

4.4.1 Data-class

Marinescu identifies a detection strategy for a data-class withmetrics and filters that interpret the results best to the de-sign problem characteristics. Throughout Marinescu’s work

14


Name Rule

D1 [Mar01] (WOC > 0 and WOC <= 0.33) and(((NOPA, top 10 classes) and NOPA >= 5) or((NOAM, top 10 classes) and NOAM >= 3)).

D2 [Mar01] ((WOC <= Bottom 33%) and WOC < 0.33) and(NOPA > 5) or NOAM > 5).

D3 [Mar01] (NOPA > 3 and NOPA >= top10%) or(NOAM > 5 and NOAM >= top10%).

Manual Manual inspection of the source code

Table 4: Data Class Filters

3 varying filters have been identified and are shown in Ta-ble 4. All 3 filters are shown and considered to start thisstudy to identify which thresholds best match the underly-ing problem. Collectively if a class’s metric results are withinthe corresponding thresholds, it identifies to be a true caseof being a data-class.

Marinescu does not describe what any of the filtering mech-anisms are trying to capture, here an educated guess to thefilters shown in Table 4 are tried to be justified here. Thefilters connected to WOC are looking for classes that have ahigher number of accessor methods than normal methods, forexample filter D1 considers WOC values to be less than 0.33.The NOPA filters are absolute to identify classes that canbe vulnerable to their state being changed directly throughinstance variables. In addition NOPA has a relative filter toidentify the top 10% classes with the highest values, as thesecould be possible classes to that hold data. The other filtersare for the NOAM metric, that identifies classes that have 5accessor methods or have the top the 10% values throughoutthe system.

The filters individually do not really capture the charac-teristics of the design problem, however composition of themetrics and filters can. For example a class may have a highNOAM and NOPA results meaning the state can easily bechanged, but the class could be large with a number of nonaccessor methods resulting in a high WOC value.

In addition a manual inspection of the source-code for the

15


Name Rule

G1 [Mar01] ((ATFD, top 10 classes) and ATFD >= 3 ) and((WMC, top 10 classes) and TCC <= 0.33)).

G2 [Mar02] (ATFD >= top 20% and ATFD > 4 ) and(WMC > 20 or TCC < 0.33 ).

Manual Manual inspection of the source code

Table 5: God Class Filters

design problem is carried out, the result will be used as abench mark for a true positive.

4.4.2 god class

Table 5 identifies the filters applied to the metrics in orderto identify potential god classes. These suggest there canbe a maximum number of god classes in a system at anyone time, based on the premise that a god class has controlof a particular part of a system, and there can only be alimited controllers throughout a system. In this case man-ual inspection sought to identify classes that access a num-ber of “lightweight” classes (that themselves could be DataClasses) and also contain lengthy methods that exhibit a highdegree of computation and control.

Marinescu again fails to describe what these filters andmetrics are trying to capture. The filters for ATFD identifyclasses with the highest values in a system, meaning theyare the most communicative to other classes. The filters forWMC identify the top complex classes in a system. Thecoupling filters consider low coupling values, as this meansthat a class is doing a number of things a things can couldbe easily split up.

In addition a manual inspection of the source-code for thedesign problem is carried out, the result will be used as abench mark for a true positive.

16

5 DATA COLLECTION

5 Data Collection

The set of metrics discussed relating to a data-class and godclass are applied manually to systems developed in Java. Amanual application of a metric can take many man-hours toapply and be a complex and error prone activity. However,three small systems are chosen to manually apply the setof metrics to help minimise the complexity and to appreciatethe implementation details. The calculated metric results areplaced into a spreadsheet, where the corresponding filters canbe applied and presented in a clear informative manner.

An alternative approach to applying metrics to a systemis through automating the process, using a tool. Such a toolshould have the functionality to apply and integrate new soft-ware measurement techniques on Java systems. Here soft-ware measurement is also referred to as software metrics,where metrics in this instance relates to measurement andnot mathematical metric space.

5.1 Software Systems

The metric sets are first applied manually to three small sys-tems all written in Java. Two of the systems were developedin-house at the University of Strathclyde and the other isan open-source project. The first is a hotel booking systemthat contained 13 classes with 1.5 KLOC. The second sys-tem is a design pattern-refactoring tool with 26 classes andabout 3 KLOC. The last system is JUnit [JUn03], an open-source testing framework for Java, that has 111 classes andjust over 5 KLOC. JUnit was chosen as a control system onthe assumption that, as the authors are respected in the thefield of software development and it has a large user base, itis likely to be well-designed. The role of the control systemis primarily to check for the presence of false-positives.

Applying these sets of metrics on the three small systemsmentioned above limits the analysis of the results as they arenot true representation of real systems. Size is an importantfactor and the question arises whether these techniques hold

17

5.2 Metric Tool 5 DATA COLLECTION

true when applied to larger systems. The current manualapplication of metrics to larger systems becomes unfeasible.

A further larger system is evaluated to test the scalabilityof the metrics and filters identified for the two design flawsusing the implemented metric tool. The system is written inJava and is another in-house system developed and evolvedwithin a research group at the department of Computer andInformation Sciences at the University of Strathclyde knownas “TypEx”.

5.2 Metric Tool

There are a number of software metric tools that can beused to apply metrics to source-code, some of the more pop-ular and easier to obtain are: SDMetrics [SDM03], JMetric[JMe00], JDepend [JDe01], Together ControlCenter [Tog02]and the Eclipse metric framework plugin [Met02]. A problemwith any of these metric tools is that they are stand-alonesystems that implement a specific set of metrics for a softwarelanguage. Trying to apply all possible software metrics be-comes a challenge. An ideal solution is to have a centralisedlocation where metrics are expressed and defined, normallyknown as a metric repository. In addition, being able toapply any of the metrics from the repository on a languageindependent system.

IBM has developed an open-source Integrated Develop-ment Environment (IDE) called Eclipse [Ecl01]. Eclipse iswritten in Java, where most of its functionality is from plu-gins. Plugins are easy to install, and can be integrated intoa working project. Eclipse has features that support the de-velopment of plugins, parsing an xml file that contains theprojects settings, named “plugin.xml” in a user friendly GUI,and having a run-time workbench that can test the plugin.Currently there is an active community developing pluginsfor varying software-engineering problems.

There is an open-source metric framework plugin [Met02]for Eclipse that has a number of metrics implemented into arepository. The metric framework can only apply the metrics

18

5.2 Metric Tool 5 DATA COLLECTION

from the repository on Java systems, but is ideal to imple-ment new metrics into the repository. This metric frameworkwill be used to implement the required metrics identified forthis study.

5.2.1 Metric Framework

The Eclipse metric framework plugin [Met02] is an open-source project were a full current working version can bedownloaded using Eclipse’s Concurrent Versions Systems (CVS)repository. The metric framework has the source-code forall the metrics implemented in the repository. The defaultmetrics within the repository does not totally match againstMarinescu’s set for a data-class and god class. However, us-ing the default metric implementations can help develop andintegrate new metrics into the repository. Developing theother metrics is helped through their manual application onsystems through appreciating and understanding the specificrequirements needed.

To add a new metric into the framework repository re-quires the implementation of a class that encapsulates themetrics functionality and the plugins settings require to beupdated in the “plugin.xml” file.

The framework interacts with other Eclipses plugins by forinstance being able to integrate its functionality into the mainGraphical User Interface (GUI). The framework exports themetric results into an interactive table where the user canchose to view the level of calculation required at the project,package, class or method level. The table has a feature thatenables the export of the metric calculations to an xml fileformat. The framework interacts with Eclipses task-view toshow results that have managed to reach a designated thresh-old defined in the plugins settings in the “plugin.xml” file.

Exporting all the metric results from an analysis of a sys-tem into an xml file allows the use of an in-house tool de-veloped within a research group at the University of Strath-clyde, to parse the file into any required file format. Thechosen format for the experiment is a spreadsheet.

19

6 MANUALLY APPLY METRICS

Export XML

Eclipse IDESource Code metrics.sourceforge.orgplugin

Task viewwarnings

table view

Metrics

ExtractorIteratorExcel

Import into

Figure 3: The Process to Extract Metric Measurements on Java Source Code.

The process this experiment uses to automate the analysisof a system is shown in Figure 3. A system is analysed us-ing the Eclipse metric framework and can be viewed eitherby extracting the metric results into a spreadsheet or by us-ing the task-view to identify positive cases of the two designproblems. However, a manual search back to a class’s sourceis required to make the final decision if the flaw exists or not.

5.2.2 Metric Definitions

The metrics that extend the Eclipse metric framework repos-itory are outlined and expressed fully in Appendix A.

6 Manually Apply Metrics

The results of applying the experiment manually are reportedin a number of tables for each system and design problem.Each of these tables shows the names of the system’s classes,the values of the set of metrics, and the corresponding ruleresults that interpret the metric values. The metric resultsare shown either as whole positive numbers or where neces-sary to two decimal places.

The original study by Marinescu identifies unique filtersfor each metric result for a specific design problem. A class’smetric results can be interpreted to automatically state whether

20

6.1 Data-Class 6 MANUALLY APPLY METRICS

it matches the design problem characteristics, through usingthe filters. The result of applying the filters to the metricvalues are shown either as a “YES” or “NO” to representpositive or negative identification.

Marinescu identifies that inner classes are not consideredas part of the calculation as they bear no real impact on theoverall design of the system.

6.1 Data-Class

6.1.1 HotelSystem

Table 6 shows the results of the metrics and rules as appliedto the Hotel System. It is noticeable that there are a numberof cases where the filtered rules and manual inspection valuesare in disagreement. The rule results mainly identify “NO”,but if we concentrate on the positive results there are someinteresting observations that can be made.

The class BillItem is identified by manual inspection as aData Class but not picked up by any of the rules. BillItemis a small class that contains two instance variables and fourmethods (two constructors and two accessor methods). Thecode for the BillItem class can be seen in Figure 4. Thecalculation of WOC in Table 6 for BillItem counts the numberof non-accessors over the number of methods in the class,which yields a value of 0.5. This value fails to trigger eitherof rules D1 and D2. Rule D3 also shows a negative resultbecause the NOPA and NOAM results are not large enoughfor the threshold of the rule and also are not in the top 10%of the overall metric results.

The description of the WOC metric by Marinescu is quiteambiguous, and trying to obtain an accurate calculation isdifficult. In this interpretation of the WOC metric, construc-tors are included as part of the calculation. Hence, the resultfor BillItem is 0.5 as the class contains two constructors andtwo accessor methods.

The class Function in Table 6 is positively identified (in-correctly) by rule D1. The Function class contains one con-structor method, four accessor methods and one method with

21


Table 6: Data Class, hotelSystem results

final public class BillItem {

private String bDescription;

private double bCost;

public BillItem() {

bDescription = "None";

bCost = 0.0;

}

public BillItem (String description, double cost) {

bDescription = description;

bCost = cost;

}

public String getDescription() {

return bDescription;

}

public double getCost() {

return bCost;

}

}

Figure 4: BillItem.java.

22


a number of condition statements, as a result gives a lowWOC and a high NOAM values. However, manual inspec-tion of this class identifies that it is not a pure Data Class,because of the 1 method with a number of condition state-ments, hence this D1 results is a false positive.

The Guest class is correctly identified by rule D1 as beinga Data Class. This class is defined with two constructor,and five accessor methods, which produces a low WOC value(and consequently triggers rule D1). However, the class isrelatively small and does not have enough accessor methodsoverall to trigger rule D2 or D3.

The Room class contains four constants and nine accessormethods. The D3 rule does not consider the overall balanceof accessor methods to other methods in a class (in the waythat the WOC metric does) and so D3 shows a positive result(because of its failure to make allowance for the eleven non-accessor methods in the class). Hence this result is a falsepositive.

6.1.2 myrfctr

The Refactory system results are shown in Table 7. Theseresults only show false positives for the D3 rule for the samereason that the Room class was falsely identified. Both theJavaProgram and Node classes have a number of accessormethods, and are thus flagged up by rule D3, but these meth-ods are not considered in relation to the rest of the methodsin the class.

A number of Data Classes have been identified manually(Abstraction, Bridge and PartialAbstraction) but missed byall of the filters. The structure of these classes is: Abstractioncontains one constructor and two accessor methods, Bridgecontains one constructor and an accessor method, PartialAb-straction contains one constructor and two accessor methods.The reason why the filters do not identify these classes, isthat they small and have low NOAM and NOPA results withboarder line WOC results for both Abstract and PartialAb-straction classes where as the Bridge class has quite a large

23

6.2 God-Class 6 MANUALLY APPLY METRICS

Table 7: Data Class, myrfctr results

WOC value. The justification for these WOC values, is againthey the calculation includes constructors.

6.1.3 JUnit

The control system, JUnit, shows one class (ProgressBar inawtui package) as being a Data Class by rule D3 as it has anumber of public constant variables, but interestingly enoughdoes not have any accessor methods. Rule D3 does not takeinto consideration any of the eleven non-accessor methods inthe class and so this result is a false positive.

6.2 God-Class

6.2.1 HotelSystem

Table 9 shows the results of the God Class model applied toHotel system. There is one false positive and three true pos-itives. The reason why the HotelDate class is a false positiveis that one of the triggers for rule G1 are metric values forATFD and WMC that are in the top 10 results. The problemhere is the system is small and has only 13 classes. HotelUIhas positive results for both rules and is confirmed as a god

24


Classes WOC2 NOPA NOAM D1 D2 D3 Manual

awtui.AboutDialog * 0 0 NO NO NO NOawtui.Logo 1.00 0 0 NO NO NO NOawtui.ProgressBar 1.00 4 0 NO NO YES NOawtui.TestRunner 0.94 0 2 NO NO NO NOextentions.ActiveTestSuite 1.00 0 0 NO NO NO NOextentions.ExceptionTestCase 1.00 0 0 NO NO NO NOextentions.RepeatedTest 1.00 0 0 NO NO NO NOextentions.TestDecorator 0.80 0 1 NO NO NO NOextentions.TestSetup 1.00 0 0 NO NO NO NOframework.Assert 1.00 0 0 NO NO NO NOframework.AssertFailedError 1.00 0 0 NO NO NO NOframework.ComparisonFailure 1.00 0 0 NO NO NO NOframework.Protectable 1.00 0 0 NO NO NO NOframework.Test 1.00 0 0 NO NO NO NOframework.TestCase 0.82 0 2 NO NO NO NOframework.TestFailure 0.67 0 2 NO NO NO NOframework.TestListener 1.00 0 0 NO NO NO NOframework.TestResult 0.71 0 5 NO NO NO NOframework.TestSuit 0.83 0 3 NO NO NO NOrunner.BaseTestRunner 0.94 1 2 NO NO NO NOrunner.ClassPathTestCollector 1.00 1 0 NO NO NO NOrunner.FailureDetailView 1.00 0 0 NO NO NO NOrunner.LoadingTestCollector 1.00 0 0 NO NO NO NOrunner.ReloadingTestSuiteLoader 1.00 0 0 NO NO NO NOrunner.SimpleTestCollector 1.00 0 0 NO NO NO NOrunner.Sorter 1.00 0 0 NO NO NO NOrunner.Sorter.Swapper 1.00 0 0 NO NO NO NOrunner.StandardTestSuiteLoader 1.00 0 0 NO NO NO NOrunner.TestCaseClassLoader 1.00 0 0 NO NO NO NOrunner.TestCollector 1.00 0 0 NO NO NO NOrunner.TestRunListener 1.00 2 0 NO NO NO NOrunner.TestSuiteLoader 1.00 0 0 NO NO NO NOrunner.Version 1.00 0 0 NO NO NO NOsamples.money.Imoney 1.00 0 0 NO NO NO NOsamples.money.Money 0.85 0 2 NO NO NO NOsamples.money.MoneyBag 1.00 0 0 NO NO NO NOsamples.money.MoneyTest 1.00 0 0 NO NO NO NOsamples.AllTests 1.00 0 0 NO NO NO NOsamples.SimpleTest 1.00 0 0 NO NO NO NOsamples.VectorTest 1.00 0 0 NO NO NO NOswingui.AboutDialog 1.00 0 0 NO NO NO NOswingui.CounterPanel 0.88 0 1 NO NO NO NOswingui.DefaultFailureDetailView 1.00 0 0 NO NO NO NOswingui.DefaultFailureDetailView.StackTraceListModel 1.00 0 0 NO NO NO NOswingui.DefaultFailureDetailView.StackEntryRenderer 1.00 0 0 NO NO NO NOswingui.FailureRunlView 1.00 0 0 NO NO NO NOswingui.FailureRunlView.FailureListCellRenderer 1.00 0 0 NO NO NO NOswingui.ProgressBar 1.00 0 0 NO NO NO NOswingui.StatusLine 1.00 2 0 NO NO NO NOswingui.TestHierarchyRunView 1.00 0 0 NO NO NO NOswingui.TestRunContext 1.00 0 0 NO NO NO NOswingui.TestRunner 1.00 0 0 NO NO NO NOswingui.TestRunView 0.80 0 1 NO NO NO NOswingui.TestSelector 0.92 0 1 NO NO NO NOswingui.TestSelector.TestCellRenderer 1.00 0 0 NO NO NO NOswingui.TestSelector.KeySelectListener 1.00 0 0 NO NO NO NOswingui.TestSelector.ParallelSwapper 1.00 0 0 NO NO NO NOswingui.TestSelector.TestCellRenderer 1.00 0 0 NO NO NO NOswingui.TestSuitePanel 0.89 0 1 NO NO NO NOswingui.TestSuitePanel.TestTreeCellRenderer 1.00 0 0 NO NO NO NOswingui.TestTreeModel 0.78 0 4 NO NO NO NOtextui.ResultPrinter 0.94 0 1 YES YES NO NOtextui.TestRunner 0.87 3 2 YES YES NO NO

Note: * invalid value # represents an innerclass value that is not part of a metric calulation

Table 8: Data Class, JUnit Results

25


Classes ATFD WMC TCC G1 G2 ManualBill 1 7 0.67 NO NO NOBillItem 0 4 0.00 NO NO NOConferenceRoom 2 16 0.30 NO NO NODelegate 2 8 1.00 NO NO NOFunction 1 10 0.30 NO NO NOFunctionDate 2 8 0.00 NO NO NOGuest 1 7 0.00 NO NO NOHotel 2 81 0.51 NO NO NOHotelDate 3 9 0.00 YES NO NOHotelSystem 0 1 * NO NO NOHotelUI 9 67 0.22 YES YES YESReservation 1 15 0.52 NO NO NORoom 1 44 0.18 NO NO NO

Note: * invalid value as calculation divides by zero

Table 9: God Class, hotelSystem results

class by manual inspection. It is the main class that the userinteracts with when using the system and as a result there area number of methods with condition statements and methodcalls to other simpler classes.

6.2.2 myrftr

The Refactory system results in Table 10 highlight two falsepositives, eight true positives and one failure to identify atrue result. The reason why AbstractAccessGUI and AddIm-plementsLinkGUI show false positives for rule G1 is again thesystem is small and requires the metric results for ATFD andWMC to fall within the top 10. JavaProgram has the highestWMC value, a low level of cohesion (TCC) but does not havehigh enough coupling (ATFD) to trigger rule G2. An inter-esting observation about the Refactory and RefactoryGUIclasses is that in addition to having identical metric valuesand both being positively identified correctly, they only differby a few lines of code. This is indicative of a potential clone(identified by Fowler as the “Duplicate Code” bad smell) butis out of scope for this paper.

26

7 REFINEMENTS OF APPLYING METRICS MANUALLY

Classes ATFD WMC TCC G1 G2 ManualAbstractAccess 0 1 * NO NO NOAbstractAccessGUI 3 10 0.00 YES NO NOAbstraction 0 3 0.00 NO NO NOAbstractionGUI 2 9 0.00 NO NO NOAbstractionHelpGUI 3 4 0.00 NO NO NOAddImplementsLinkGUI 3 10 0.00 YES NO NOBridge 0 2 * NO NO NOBridgeGUI 3 8 0.00 NO NO NOConstructor 0 21 0.61 NO NO NOEncapsulateConstruction 1 3 * NO NO NOEncapsulateConstructionGUI 3 8 0.00 NO NO NOFactoryMethod 1 1 * NO NO NOFactoryMethodGUI 3 8 0.00 NO NO NOJavaFilter 0 7 0.00 NO NO NOJavaProgram 3 147 0.25 YES NO YESMethod 0 31 0.62 NO NO NONode 0 11 0.24 NO NO NOPartialAbstraction 0 3 0.00 NO NO NOPartialAbstractionGUI 3 9 0.00 NO NO NORefactory 8 27 0.13 YES YES YESRefactoryGUI 8 29 0.13 YES YES YESStringCharacterIterator 1 22 0.87 NO NO NOTest1 1 1 * NO NO NOTest2 0 1 * NO NO NOWrapper 2 6 0.33 NO NO NOWrapperGUI 3 8 0.00 NO NO NO

Note: * invalid value as calculation divides by zero

Table 10: God Class, myrftr results

6.2.3 JUnit

The results of the application to JUnit reveal a number offalse positives - primarily for reasons already discussed. Oneinteresting results was the false identification of the TestRun-ner class as a god class. This class does have a number ofrelatively complex methods that interact with and control anumber of other classes. However, the classes it interactswith are not themselves Data Classes and so TestRunnerought not to be considered as a god class.

7 Refinements of Applying Metrics Manually

In response to the issues raised in the previous section, anumber of refinements to the metrics and rules that are usedto identify both Data Classes and god classes are proposedand evaluated.

27

7 REFINEMENTS OF APPLYING METRICS MANUALLY

Classes ATFD WMC TCC G1 G2 G3 Manual

awtui.AboutDialog 2 3 * NO NO NO NOawtui.Logo 1 6 0.00 NO NO NO NOawtui.ProgressBar 2 14 0.18 NO NO NO NOawtui.TestRunner 6 56 0.11 YES YES NO YESextentions.ActiveTestSuite 0 10 0.50 NO NO NO NOextentions.ExceptionTestCase 0 3 * NO NO NO NOextentions.RepeatedTest 0 7 0.00 NO NO NO NOextentions.TestDecorator 0 6 0.60 NO NO NO NOextentions.TestSetup 0 5 0.00 NO NO NO NOframework.Assert 0 55 0.00 NO NO NO NOframework.AssertFailedError 0 2 0.00 NO NO NO NOframework.ComparisonFailure 0 11 * NO NO NO NOframework.Protectable 0 1 * NO NO NO NOframework.Test 0 2 0.00 NO NO NO NOframework.TestCase 0 14 0.05 NO NO NO NOframework.TestFailure 1 7 0.13 NO NO NO NOframework.TestListener 0 4 0.00 NO NO NO NOframework.TestResult 1 23 0.08 NO NO NO NOframework.TestSuit 4 36 0.05 YES NO NO NOrunner.BaseTestRunner 4 51 0.01 YES NO NO YESrunner.ClassPathTestCollector 0 15 0.00 NO NO NO NOrunner.FailureDetailView 0 3 0.00 NO NO NO NOrunner.LoadingTestCollector 1 10 0.00 NO NO NO NOrunner.ReloadingTestSuiteLoader 0 3 0.00 NO NO NO NOrunner.SimpleTestCollector 0 2 * NO NO NO NOrunner.Sorter 0 7 0.00 NO NO NO NOrunner.StandardTestSuiteLoader 0 2 0.00 NO NO NO NOrunner.TestCaseClassLoader 2 36 0.04 NO NO NO NOrunner.TestCollector 0 1 * NO NO NO NOrunner.TestRunListener 0 6 0.00 NO NO NO NOrunner.TestSuiteLoader 0 2 0.00 NO NO NO NOrunner.Version 0 2 * NO NO NO NOsamples.money.Imoney 0 8 0.00 NO NO NO NOsamples.money.Money 0 18 0.03 NO NO NO NOsamples.money.MoneyBag 0 36 0.33 NO NO NO NOsamples.money.MoneyTest 0 24 0.50 NO NO NO NOsamples.AllTests 0 2 0.00 NO NO NO NOsamples.SimpleTest 0 6 0.07 NO NO NO NOsamples.VectorTest 0 10 0.58 NO NO NO NOswingui.AboutDialog 2 4 * NO NO NO NOswingui.CounterPanel 5 9 0.11 NO YES NO NOswingui.DefaultFailureDetailView 3 6 0.07 NO NO NO NOswingui.FailureRunlView 3 10 0.11 NO NO NO NOswingui.ProgressBar 1 7 0.50 NO NO NO NOswingui.StatusLine 4 4 0.33 NO NO NO NOswingui.TestHierarchyRunView 3 10 0.48 NO NO NO NOswingui.TestRunContext 0 2 0.00 NO NO NO NOswingui.TestRunner 11 121 0.05 YES YES NO YESswingui.TestRunView 0 6 0.00 NO NO NO NOswingui.TestSelector 5 19 0.17 NO YES NO NOswingui.TestSuitePanel 2 13 0.42 NO NO NO NOswingui.TestTreeModel 2 32 0.08 NO NO NO NOtextui.ResultPrinter 0 22 0.00 NO NO NO NOtextui.TestRunner 1 25 0.04 NO NO NO NO

Note: * invalid value

Table 11: God Class, JUnit results

28

7.1 Data Class 7 REFINEMENTS OF APPLYING METRICS MANUALLY

7.1 Data Class

The three rules that Marinescu [Mar01, Mar02] used to iden-tify a Data Class all place bounds on both the NOPA andNOAM metrics (see Table 2). These measure the numberof public attributes and accessor methods in a class, respec-tively. These bounds vary from considering metric results fora class that are over 3 or over 5, and in the top 10 or top 10%of results for a system. A class that meets these bounds isconsidered to be a candidate Data Class. However, placingbounds on these metric results is inappropriate for this de-sign problem. A class that purely contains constructor andaccessor methods is a Data Class and the values of NOPAand NOAM will vary from one class to another and plac-ing bounds on these measurements will tend to miss DataClasses with a small public interface. The size of a class to beconsidered a pure Data Class is not the main consideration,the importance is the structure containing only constructorand accessor methods.

Another observation regards the inclusion of constructorsin the calculation of the WOC metric. Constructors do nothave any real bearing on the overall design of a class andshould not be part of the WOC measurement. This refine-ment is considered and named WOC2, see Table 12. A newrule that takes into consideration of this WOC2 value is D4- if the result equals zero then the corresponding class wouldbe a candidate data-class, as it would contain only methodsthat are either constructors or accessors.

Classes that contain a predominate number of accessormethods and instance variables of type public, are not tobe considered as a Data Class. Only pure Data Classes willbe taken into account, where classes only contain either in-stance variables, constructor methods or accessor methods.Consider the range of results WOC2 can produce, if a classsolely has the characteristics of a data-class, the only meth-ods that do exist are either constructor, accessor-methods orhas no methods at all, would produce a zero value. A valuegreater than zero means there are other methods defined in

29

7.2 God-Class 7 REFINEMENTS OF APPLYING METRICS MANUALLY

Name Description/ Rule

WOC2 (Weight of Class) Number of non-accessor methods in aclass divided by the total number ofmembers of the interface, not includingconstructors.

NOMeth-NOCON(Number of methods, not constructors) Number of methods that are not mem-

bers of an interface, not including con-structors.

D4 WOC2 == 0

Table 12: Data Class Refined metrics and rules

the class that have additional functionality other than chang-ing or returning the state of instance variables.

These refinements proposed for a Data Class have beenapplied to each of the three systems. Table 13 shows theresults of the refinements applied to the Hotel system, withaddition of the other rules as a comparison. There are onlypositive results identified by D4, which match up with thatof the manual inspection results. The other two systems alsoshowed only true positives for D4.

7.2 God-Class

A god class rule typically consists of three components: thecomplexity of the class, the measurement of coupling, andmeasurement of cohesion. The filters proposed by Marinescu[Mar01, Mar02] only consider two out of the three metricvalues at any one time, and hence result in the identificationof a number of false positives.

An observation made using the WMC metric to measurea class’s complexity only identifies the worst-case scenariothrough counting the number of conditions. The higher aclass’s WMC, means there are a number of control conditionstatements and said to be rather complex.

The current coupling metric ATFD (see Table 3) definesthe coupling of a class as the number of other classes whoseinstance variables are accessed either directly or via accessormethods. However, the metric does not consider the other

30


DATA CLASS: WOC WOC2 NOPA NOAM D1 D2 D3 D4 Manual

myrfctr.AbstractAccess 1.00 * 0 0 NO NO NO NO NO

myrfctr.AbstractAccessGUI 1.00 1.00 0 0 NO NO NO NO NO

myrfctr.Abstraction 0.33 0.00 0 2 NO NO NO YES YES

myrfctr.AbstractionGUI 1.00 1.00 0 0 NO NO NO NO NO

myrfctr.AbstractionHelpGUI 1.00 1.00 0 0 NO NO NO NO NO

myrfctr.AddImplementsLinkGUI 1.00 1.00 0 0 NO NO NO NO NO

myrfctr.Bridge 0.50 0.00 0 1 NO NO NO YES YES

myrfctr.BridgeGUI 1.00 1.00 0 0 NO NO NO NO NO

myrfctr.Constructor 0.71 0.67 0 4 NO NO NO NO NO

myrfctr.EncapsulateConstruction 1.00 1.00 0 0 NO NO NO NO NO

myrfctr.EncapsulateConstructionGUI 1.00 1.00 0 0 NO NO NO NO NO

myrfctr.FactoryMethod 1.00 * 0 0 NO NO NO NO NO

myrfctr.FactoryMethodGUI 1.00 1.00 0 0 NO NO NO NO NO

myrfctr.JavaFilter 1.00 1.00 0 0 NO NO NO NO NO

myrfctr.JavaProgram 0.73 0.71 0 8 NO NO YES NO NO

myrfctr.Method 0.73 0.69 0 4 NO NO NO NO NO

myrfctr.Node 0.45 0.14 1 6 NO NO YES NO NO

myrfctr.PartialAbstraction 0.33 0.00 0 2 NO NO NO YES YES

myrfctr.PartialAbstractionGUI 1.00 1.00 0 0 NO NO NO NO NO

myrfctr.Refactory 1.00 1.00 0 0 NO NO NO NO NO

myrfctr.RefactoryGUI 1.00 1.00 0 0 NO NO NO NO NO

myrfctr.StringCharacterIterator 0.77 0.70 1 3 NO NO NO NO NO

myrfctr.Test1 1.00 1.00 0 0 NO NO NO NO NO

myrfctr.Test2 1.00 1.00 0 0 NO NO NO NO NO

myrfctr.Wrapper 0.40 0.25 0 3 NO NO NO NO NO

myrfctr.WrapperGUI 1.00 1.00 0 0 NO NO NO NO NO

Table 13: Data Class, hotelSystem Refinement

31


Name Description/ Rule

CBO (Coupling Between Objects) Count the number of other classes to whichit its coupled. [CK94]

G3 CBO >= 14 and WMC > 20 and TCC <0.33

Table 14: God Class Refinements of metrics and rules

(non-accessor) methods a class uses from another class, andso does not give a full coupling measure. An alternativemetric that does take this into consideration is the CouplingBetween Objects (CBO) metric as described by Chidamberand Kemerer [CK94]. However, using the CBO metric re-quires the placing of bounds on the results so that the rulecould identify what is considered as being high coupling.

These two new refinements are taken into considerationand shown in Table 14, where G3 rule uses all three metricresults and keeps the same bounds from G2 rule for WMCand TCC, but has used 14 as the bound for CBO (this wasthe lowest value that complied with all three of the systemsto identify true positive results). Clearly, the value of 14 willnot necessarily be applicable to all systems and so furtherevaluation is needed in this respect to identify appropriatebounds. A further issue here is the nature of the classes towhich the class being considered is coupled. For the class tobe considered as a potential god class then these other classesshould have little in the way of functionality and primarily beData Classes. Future refinements will explore the inclusionof this criterion into the coupling measure to improve theaccuracy of this rule.

These refinements were applied to each of the systems andshowed some interesting results. Table 15 shows the resultsof the refinements for the god class design flaw applied to therefactory system. All the rules are shown for the purposesof comparison, and it can be clearly seen that G3 positiveresults are true and match that of the manual inspectioncolumn. A similar pattern emerges for the results of boththe Hotel System and JUnit.

32


DATA CLASS: WOC NOPA NOAM D1 D2 D3 D4 Manual

myrfctr.AbstractAccess 1.00 0 0 NO NO NO NO NOmyrfctr.AbstractAccessGUI 1.00 0 0 NO NO NO NO NOmyrfctr.Abstraction 0.33 0 2 NO NO NO YES YESmyrfctr.AbstractionGUI 1.00 0 0 NO NO NO NO NOmyrfctr.AbstractionHelpGUI 1.00 0 0 NO NO NO NO NOmyrfctr.AddImplementsLinkGUI 1.00 0 0 NO NO NO NO NOmyrfctr.Bridge 0.50 0 1 NO NO NO YES YESmyrfctr.BridgeGUI 1.00 0 0 NO NO NO NO NOmyrfctr.Constructor 0.71 0 4 NO NO NO NO NOmyrfctr.EncapsulateConstruction 1.00 0 0 NO NO NO NO NOmyrfctr.EncapsulateConstructionGUI 1.00 0 0 NO NO NO NO NOmyrfctr.FactoryMethod 1.00 0 0 NO NO NO NO NOmyrfctr.FactoryMethodGUI 1.00 0 0 NO NO NO NO NOmyrfctr.JavaFilter 1.00 0 0 NO NO NO NO NOmyrfctr.JavaProgram 0.73 0 8 NO NO YES NO NOmyrfctr.Method 0.73 0 4 NO NO NO NO NOmyrfctr.Node 0.45 1 6 NO NO YES NO NOmyrfctr.PartialAbstraction 0.33 0 2 NO NO NO YES YESmyrfctr.PartialAbstractionGUI 1.00 0 0 NO NO NO NO NOmyrfctr.Refactory 1.00 0 0 NO NO NO NO NOmyrfctr.RefactoryGUI 1.00 0 0 NO NO NO NO NOmyrfctr.StringCharacterIterator 0.77 1 3 NO NO NO NO NOmyrfctr.Test1 1.00 0 0 NO NO NO NO NOmyrfctr.Test2 1.00 0 0 NO NO NO NO NOmyrfctr.Wrapper 0.40 0 3 NO NO NO NO NOmyrfctr.WrapperGUI 1.00 0 0 NO NO NO NO NO

Table 15: God Class, myrfctr refinements applied

33

7.3 Further Issues with Automatic Detection 8 TOOL APPLICATION

7.3 Further Issues with Automatic Detection

During this study a number of problems have been identifiedthat are related to either the application of the metrics orthe interpretation of the results. Another interesting issuethat arises is in relation to the automatic identification of theprogram elements upon which the metrics are defined - oneparticular instance being accessor methods. Marinescu usesthe following pattern to locate accessor methods: “accessor-methods are small-methods, with a unitary cyclomatic com-plexity, and we rely on the name convention, stating that thenames of accessor methods are prefixed with the get (or Get)and set (or Set) prefix” [Mar01]. In carrying out this workit became clear that accessor methods often do no have getand set as prefixes and even identifying them manually raisedsome ambiguities. For example should a method named is-Bird that returns a boolean value of an instance variable beconsidered an accessor method? using Marinescu’s abovedefinition will not. This is another area that needs furtherexploration.

8 Tool Application

Metrics, which are implemented into the Eclipse metric frame-work repository, are tested to check their calculation valid-ity. The implemented metrics are applied to the three smallsystems and their results are compared against the manualcalculations. Mismatches between the two sets of results mayidentify possible metric implementation problems, or humanerror in calculation. This process of validating the imple-mented metrics calculations is limited to three small systems,scaling the analysis to larger systems is required. However,as already mentioned manually applying metrics to largersystems becomes unmanageable, a solution therefore is tochoose a random distribution of classes in a system and cal-culate metrics manually and match the results against thetools calculation.

Using the Eclipse metric framework allows experimenta-

34

8.1 Data-Class 8 TOOL APPLICATION

tion with the set of metrics that correspond to a design prob-lem. The additional metrics to be applied to each designproblem are described and justified below.

8.1 Data-Class

Inner-classes implemented in another class will usually meana class has more functionality than just holding data. Aclass that has inner-classes can not be considered to be adata-class. Hence, the metric result of Number Of Classes(NOC), should only consider classes with values equal to one,meaning there are no inner-classes defined in the body of aclass. The value produced will always be greater than orequal to one, depending on the number of classes defined.An interface definition only contains method definitions, withno instance variables or method bodies, and is not a classdefinition that could give an object a state. Hence, interfacesare not included as part of NOC.

A class constructor initialises the state of an object wherethere can be many methods to instantiate different states ofan object. However constructors do not add functionality toa class and should not be considered as part of the number ofmethods within a class calculation. Hence, if the Number OfClass Constructors (NOCC) calculation is taken away fromthe total Number Of Class Methods (NOCM) will identifythe number of methods with functionality in a class.

8.2 God-Class

A god class in the behavioural form contains a large pro-portion of a system’s functionality, and this may be shownthrough a class being large in size. A possible measurementto represent this is the number of Lines Of Code (LOC). Thismeasure does not consider a class’ functionality but it servesas a quick overview of a class’ size.

Another characteristic of a god class is not sharing its func-tionality with other classes. A class which is at the top of ahierarchy with no siblings and only “Object” as its superclass

35

8.3 TypEx 8 TOOL APPLICATION

could be a potential god class. The Depth of Inheritance Tree(DIT) calculation equal to one would identify this character-istic.

Previously mentioned Marinescu identifies the ATFD met-ric for the design problem god class, there are a few problemswith trying to calculate on source-code mentioned in a pre-vious section. The fist data analysis was applied manually onthree small Java systems this was not a real problem. How-ever trying to automate this calculation by implementing itinto the metric tool is inappropriate.

A proposed alternative to Marinescu’s ATFD coupling met-ric definition is to use the Chidamber and Kemmerer’s Cou-pling Between Objects (CBO). CBO counts the number ofother classes a class is coupled to [CK94]. This is the numberof different reference types used as instance variable declara-tions, formal parameters, return types, throw declarations,local variables, and types from which attribute and methodselections are made. References of a type are counted onlyonce. For example if a type exists as an instance variableand in a method parameter list would only produce a singlecount.

However limiting the types the CBO metric counts to data-classes only would be an easier coupling measurement to im-plement than ATFD. The aim of this newly defined metric isto measure the number of data-classes a class is coupled to.This is called Coupling Between Data-Classes (CBDC). Twointerpretations of this metric are implemented, one to countthe number of different connected data-classes (CBDC1) andanother to count the frequency of the coupled data-classes(CBDC2).

8.3 TypEx

The larger system “TypEx” is a Java Data Binding Systemfor Streams of XML [Rus04], and will be used to test and ap-ply the implemented metrics. The results will form the basesfor justification to redefine some of the metrics proposed byMarinescu.

36


Manual inspection of “TypEx” was carried out with thedeveloper to utilise their knowledge of the system to aid lo-cating possible design problems. The developer was briefedof the characteristics of the two design problems under in-vestigation, and then asked if any classes match or closelymatched these requirements, these are shown in the resulttables by highlighting the relevant classes.

The data-classes identified by the developer of the systemare compared against the redefined set of metric filters ap-plied to “TypEx”. The god classes are interpreted slightlydifferent this time from Marinescu’s study. The filters are notused, as the metric result can interpret the calculation inap-propriately. Instead the structure of a class and the metricresults are considered to help justify if a god class is present.

Each class identified to be a true-positive for a data-classor a god class by the developer, the obvious structure is de-scribed first, then a justification for the metric results, andwhere possible recognise any of the design problem charac-teristics. In addition the analysis gives a reason if the classshould or not be a true-positive.

Also any large significant outliers in the metric results arequeried to help understand why this is the case. A metricoutlier could represent a class to have the corresponding de-sign problem, or could be a false-positive and possibly high-light problems with the metric calculation or definition.

8.3.1 Data-Class

A brief over-view of all “TypEx” metric results, notices therebeing a number of “-1” values corresponding to Number OfAccessor Methods (NOAM) and Weight Of Class (WOC)values.

The reasoning why NOAM produces a “-1” value is whena class’s definition only contains instance variables with nomethods. In this situation NOAM should produce a zerovalue, but this was implemented to distinguish the differencewhen a class only contains constructor methods. However,this feature is not required and should be changed to produce

37


a zero value.Trying to justify why WOC produces “-1” values, we first

consider the definition by Marinescu, “The number of non-accessor methods in a class divided by the total numberof members of the interface. Inherited members are notcounted” [Mar02]. The case when a class is defined withno method implementations would from the above definitionof WOC, produce an undefined value as dividing by zero.A class implemented with only instance variables known asa “struct” data structure in C programming language, inan O-O system can be considered bad programming design.However this metric should be able to consider such instancesas to develop a robust metric definition.

With the redefinition taken into consideration with WOC2metric, constructor methods are not considered as part of thecalculation. However a “-1” value is produced when a class isdefined with constructor methods only, as this new definitionwould mean dividing by zero. This small change is made, andreapplied to the systems with no negative values produced.

Analysis with Developer The full analysis of TypEx with the de-veloper identifying a data-class are shown in Appendix B, theconclusions drawn from these results are presented here. Theresults highlight two main differences that keep reoccurring,some classes use methods that are automatically inheritedfrom Object superclass, where as others deviate from beinga data-class by being a class, that uses classes within JUnittesting framework.

The false positives identify classes that are interpreted cor-rectly by the metric results and where the developer hasfailed to. This is a good result as the automation identi-fies more cases than the developer, who is very familiar withthe system. The developer was shown the classes they didnot identify to be a Data Class, and made the following re-sponse “even being familiar with the system; knowledge ofsome classes content may have lapsed from being inactive”.

Together both metric interpretations, NOC and WOC2has a better chance of locating a class with the characteristics

38


Table 16: Data-Class, TypEx.

39


Table 17: God-Class, TypEx.

of a data-class.However, if we map these metric interpretations against

the classes that the developer thought were data-classes, aone-to-one correlation does not exist. The key differencebetween the two results is the developer identifies classesthat are mainly data-classes but deviate from the design-flaws characteristics by additional methods, such as “main”or methods that test the class or system.

8.3.2 God-Class

Analysis with Developer These results of TypEx looking for agod class has shown there are two true and one false positive,identifying there are only a few cases throughout the system.This result is good for identifying god classes within a system,as the definition of the design-flaw surmises there should bea limited number of classes with this nature doing a largeproportion of a systems functionality.

40

9 CONCLUSIONS

9 Conclusions

Some of the metrics implemented may not recognise theunique aspects of O-O design but are assuming it to be simplyan extension of structural programming techniques [MH99].Take for instance part of the WMC metric calculation, count-ing the number of methods a class currently implements.Does a method belong only to the class, which defines it, ordoes it also belong to every class, which inherits it directly orindirectly? [CS95]. These realisations are not too clear-cutin some metric definitions, make it difficult to replicate theirimplementation. In some cases metric results may not givea true representation of what they are trying to measure.

If we consider the WMC, this does not really identify thetrue complexity of methods in a class. An example given byMayer and Hall [MH99], if a class has 10 methods with eachsimple complexity, would give the same value of a class with asingular method implemented having McCabe’s complexityof 10. Taking this realisation of WMC into consideration,high results produced from applying this metric on a classdoes not necessarily give an accurate representation of it’scomplexity.

9.1 In Addition

During this study a number of problems have been identifiedthat are related to either the application of the metrics orthe interpretation of the results.

An issue has arisen in relation to the automatic identifica-tion of the program elements upon which the metrics are de-fined - one particular instance being accessor methods. Mari-nescu uses the following pattern to locate accessor methods:“accessor-methods are small-methods, with a unitary cyclo-matic complexity, and we rely on the name convention, stat-ing that the names of accessor methods are prefixed with theget (or Get) and set (or Set) prefix” [Mar01]. In carrying outthis work it became clear that accessor methods often do nothave get and set as prefixes and even identifying them man-

41

9.1 In Addition 9 CONCLUSIONS

ually raised some ambiguities. For example, getBarcodeItemcould be a method that takes in a barcode number that re-turns the name of the item in words, where all the itemsare stored in a data structure as an instance variable. Themethod would return a small part of the instance variable,with effect should not be considered as an accessor method,however from Marinescu’s above description would.

If we consider the attributes of an abstract class, it isa class that cannot be instantiated and consequently couldnot be used to hold data unless extended by another class,where normally a class would contain more functionality thana data-class. In addition an abstract class could not have acentral role in a system, hence could not be a god class either.

Also if we consider the attributes of an interface, shouldnot considered as part of any metric calculation, as it con-tains no implementation detail to apply and calculate anymeasurements upon.

Some observations made from Chidamber and KemmererO-O metric suite paper [CK94], regarding the analysis of theresults of the metrics they define in the paper, could relate topossible characteristics of a design problem. An outlier classhad a large number of methods defined (106), which is a rootof the hierarchy. A class that is the root of a hierarchy witha perceived large amount of functionality could be an idealcase to be a god class as it could be doing large amounts ofcomputation, which should be equally spread in other classesor either through descendent siblings.

Also an interesting observation made by the developer ofTypEx, is a potential attribute to help automate locating apossible god class. The entry point into a system is a classwhich implements a “main” method, could contain a num-ber of functionality’s, which is coupled to many classes toboot up the system. However, not all classes that imple-ment a “main” method could have this characteristic, as awell-designed class has a few lines to call a few classes thatencapsulate all the start up processes. More recently, testingis becoming a more recognised important part of every stageof developing software. Each class, or new methods within a

42

9.2 Current State of the Art 9 CONCLUSIONS

class can be tested in numerous ways, with the more commonones using a testing framework such as JUnit or implement-ing a “main” method. The latter case means the class canbe tested stand-alone without interfering with the rest of thesystem.

9.2 Current State of the Art

There have been a number of other attempts to automati-cally identify design deficiencies within software. Simon etal. [SSL01] measure the cohesion between modules and usethis as a distance value, which is then applied to a VirtualReality Modelling Language (VRML) with a fast spring em-bedder program to produce 3D models. Such visualisationscan aid in gaining an understanding of the relationships be-tween classes in a system and help towards identifying designproblems. One of the challenges with this approach is inter-preting the visualisations produced and knowing what a baddesign looks like. More complex systems produce more clut-tered visualisations, which increases the difficulty of identi-fying areas of bad design.

A slightly different approach is taken by Demeyer et al.[DDN00] who use change metrics to compare the differencesbetween two software builds. The pattern of changes made toa system is analysed to determine the refactorings that havetaken place. The aim of this is to support the software engi-neer in understanding the system in terms of the refactoringsthat have been applied.

The use of invariants to assess the internal structure of asystem is another way to identify areas to refactoring. Daikon[Dai01] is a tool, which detects invariants within a system atspecific points such at loop heads and procedure entries andexits. The work of Kataoka et al. [KEGN01] uses the Daikontool to extract invariants from a piece of code, which are theninterpreted to suggest a small number of possible refactoringsthat could be applied.

The work carried out by Mantyla et al. [MVL03] identi-fies a taxonomy for bad smells through an empirical study.

43

9.2 Current State of the Art 9 CONCLUSIONS

The taxonomy identifies seven categories which the 22 badsmells fit into as a means to make them more understand-able through recognising the relationships between the badsmells.

van Emden and Moonen [VEM02] describe jCOSMO, aprototype generic “code smell browser” that is also based ona two-stage process of parsing - which extracts the primitivecode structure and primitive smell aspects (easily observabledesign flaws), and visualisation. The extracted source modelmay then be analysed for what the authors term, “derivedsmells” (non-primitive design flaws that may be detected byinference). The approach is illustrated with two primitiveflaws on a significant piece of software, but there is no sug-gestion as to how well the flaws defined by Fowler (almost allof which would be considered non-primitive flaws) are identi-fied. Miceli at al. [MSG99] employ a metrics-based approachto identify areas where some of the transformations describedby Opdyke may be applied. The metrics and the associatedtransformations are mainly associated with the structure ofinheritance hierarchies and although the evaluation is verylimited the results are promising. In their investigation ofthe impact of god classes on maintenance activities, Deli-giannis et al. [DSRS03] note that “...there is a considerablerelationship between that heuristic (the god class) and met-rics so that it could be feasible to conduct an assessmentby using appropriate metrics”. Tahvildari and Kontogiannis[TK03] also adopt a metrics-based approach to check for someof the heuristics identified by Riel [Rie96] - “key classes” and“one class, one concept”. They evaluate their approach ona Java expert system shell, again with encouraging results.A different approach is taken by Tourwe and Mens [TM03].They use logic meta-programming that takes the form of aProlog-based description of the attributes of a design flaw,and demonstrate the application of the technique to two ofthe flaws defined by Fowler.

44

9.3 Future work 9 CONCLUSIONS

9.3 Future work

Future work will expand both the range of systems used asa data set and the consideration of further design problems.In addition, the program attributes that contribute to themetrics require investigation and a more formal definition.

A future study is to build upon Marinescu [Mar02] andMantyla et al. [MVL03], existing ideas specifically focusingon bad smell design heuristics as each has a corresponding setof refactorings that can be applied as to improve the overalldesign of a system.

The paper has described the application of a number ofmetrics and filters for the identification of Data Classes andgod classes. The original definitions as proposed by Mari-nescu demonstrated some degree of success but tended toraise a number of false positives primarily due to the inclu-sion of absolute values within the rules. The filters wererefined and evaluated further and seemed to produce moreaccurate results. A future study is to consider the possibilityof not using filters to interpret the metric results, but insteadto order the results. Using ordering would not restrict theresults, as they seem to with using filters.

45

REFERENCES REFERENCES

References

[BB03] L. Badri and M. Badri. A New Class CohesionCriterion: An Empirical Study on Several Sys-tems. In 7th European Conference on Object-Oriented ProgrammingWorkshop on Quantita-tive Approaches in Object-Oriented SoftwareEngineering, Darmstadt, July 2003.

[BC98] H. Bar and O. Ciupke. Exploiting designheuristics for automatic problem detection. InS. Ducasse and J. Weisbrod, editors, Pro-ceedings of the ECOOP Workshop on Experi-ences in Object-Oriented Re-Engineering, num-ber 1543 in Lecture Notes in Computer Science,Brussels, July 1998. Springer-Verlag.

[BEEGR00] S. Benlarbi, K. El Emam, N. Goel, and S. Rai.Thresholds for Object-Oriented Measures. In11th International Symposium on Software Re-liability Engineering, pages 24–39, San Jose, Oc-tober 2000.

[BK95] J.M. Bieman and B-K. Kang. Cohesion andReuse in an Object-Oriented System. In ACMSymposium on Software Reusability, pages 259–262, Seattle, April 1995.

[BL71] L.A. Belady and M.M. Lehman. ProgrammingSystem Dynamics or the meta-dynamics of Sys-tems in Maintenance and Growth. Technical Re-port RC3546, IBM, 1971. Reprinted in M.M.Lehman, L.A. Belady, editors, Program Evolu-tion: Process of Software Change, Ch 5, APICStudies in Data Processing No.27. AcademicPress, London, 1985.

[BMD+00] M. Balazinska, E. Merlo, M. Dagenais, B. Lague,and K. Kontogiannis. Advanced colone-analysisto support object-oriented system refactoring.

46


In 7th Working Conference on Reverse Engi-neering (WCRE), Brisbane, 2000. IEEE Com-puter Society Press.

[BMMM98] W.H. Brown, R.C. Malveau, H.W. McCormick,and T.J. Mowbray. Anti Patterns: RefactoringSoftware, Architectures, and Projects in Crisis.Wiley, 1998.

[CK94] S.R. Chidamber and C.F. Kemerer. A MetricsSuite for Object-Oriented Design. IEEE Trans-actions on Software Engineering, 20(6):476–493, June 1994.

[CS95] N. Churcher and M. J. Shepperd. Towardsa Conceptual Framework for Object OrientedSoftware Metrics. ACM SIGSOFT, SoftwareEngineering Notes, 20(2):69–76, April 1995.

[Dai01] Daikon. Dynamic Invariant De-tector, 2001. visted June 2002,http://pag.lcs.mit.edu/daikon.

[DDN00] S. Demeyer, S. Ducasse, and O. Nierstrasz.Finding Refactoring via Change Metrics. InObject-Oriented Programming Systems, Lan-guages and Applications (OOPSLA), volume35(10), pages 166–177, Minneapolis, October2000. SIGPLAN.

[Dij68] E. W. Dijkstra. Go To Statement ConsideredHarmful. Communications of the ACM, 11(3),March 1968.

[Dra03] I. Dragos. Automating Design Flaw Correctionin Object-Oriented Systems. Master’s thesis,Politehnica University of Timisoara, 2003.

[DSRS03] I. Deligiannis, M. Sheppard, M. Roumeliotis,and I. Stamelos. An Empirical Investigation of

47

http://pag.lcs.mit.edu/daikon


an Object-Oriented Design Heuristic for Main-tainability. The Journal of Systems and Soft-ware, 65(2):127–139, February 2003.

[Ecl01] Eclipse, IBM, 2001. visited June 2003,http://www.eclipse.org.

[FN00] N. E. Fenton and M. Neil. Software Metrics:Road-map. In 22nd International Conferenceon Software Engineering Future of SoftwareEngineering Track, Limerick, June 2000.

[Fow99] M. Fowler. Refactoring: Improving the Designof Existing Code. Addison-Wesley, 1999.

[FP96] N.E. Fenton and S.L. Pfleeger. Software Met-rics, A Rigorous and Practical Approach. In-ternational Thomson Computer Press, 2nd edi-tion, 1996.

[GHJV94] E. Gamma, R. Helm, R. Johnson, and J. Vlis-sides. Design Patterns: Elements of Object-Oriented Software. Addison-Wesley, 1994.

[JDe01] JDepend, 2001. visited June 2003,http://www.clarkware.com/software/JDepend.htm.

[JMe00] JMetric, 2000. visited January 2003,http://www.it.swin.edu.au/projects/jmetric/products/jmetric.

[JUn03] JUnit. A simple framework to write repeat-able tests. Beck, K. and Gamma, E., May,version: 3.8.1. 2003. visited January 2003,http://www.junit.org.

[KEGN01] Y. Kataoka, M.D. Ernst, W.G. Griswold, andD. Notkin. Automated Support for ProgramRefactoring using Invariants. In Proceedingof the International Conference on SoftwareMaintenance, pages 736–743, Florence, Novem-ber 2001.

48

http://www.eclipse.org

http://www.clarkware.com/software/JDepend.htm

http://www.it.swin.edu.au/projects/jmetric/products/jmetric

http://www.junit.org


[Lan03] M. Lanza. Object-Oriented Reverse Engineer-ing, Coarse-grained, Fine-grained, and Evolu-tionary Software Visualisation. PhD thesis,University of Bern, May 2003.

[LH89] K.J. Lieberherr and I.M. Holland. Assur-ing Good Style for Object-Oriented Programs.IEEE Software, 6(5):38–48, September 1989.

[Mar01] R. Marinescu. Detecting Design Flaws via Met-rics in Object-Oriented Systems. In 39th Inter-national Conference and Exhibition on Tech-nology of Object-Oriented Languages and Sys-tems, page 173, Santa Barbara, July/August2001. IEEE Computer.

[Mar02] R. Marinescu. Measurement and Quality inObject-Oriented Design. PhD thesis, “Po-litehnica” University of Timisoara, October2002.

[McC76] T.J. McCabe. A Complexity Measure.IEEE Transactions on Software Engineering,2(4):308–320, December 1976.

[Met02] Metrics, 2002. visited June 2003,http://metrics.sourceforge.net.

[MH99] T. Mayer and T. Hall. A Critical Analysis ofCurrent Object Oriented Design Metrics. Soft-ware Quality Journal, 8, 1999.

[MSG99] T. Miceli, H.A. Sahraoui, and R. Godin. AMetric Based Technique for Design Flaws De-tection and Correction. In 14th InternationalConference on Automated Software Engineer-ing, pages 307–310. IEEE Computer, October1999.

[Mun04] M.J. Munro. A Measurement-Based Approachfor Detecting Design Problems in Object-

49

http://metrics.sourceforge.net


Oriented Systems. Technical Report EFoCS-57-2005, University of Strathclyde, July 2004.

[MVL03] M. Mantyla, J. Vanhanen, and C. Lassenius. ATaxonomy and an Initial Empirical Study of BadSmells in Code. In International Conference onSoftware Maintenance, Portland, May 2003.

[Rie96] A. Riel. Object-Oriented Design Heuristics.Addison-Wesley, 1996.

[Rus04] G. R. Russell. Typex, 2004. visited January2004, http://typex.dev.java.net.

[SDM03] SDMetrics, 2003. visited June 2003,http://www.sdmetrics.com.

[SP00] P. Steven and R. Pooley. Using UML SoftwareEngineering with Objects and Components. Ob-ject Technology Series. Addison-Wesley, 2000.

[SSL01] F. Simon, F. Steinbruckner, and C. Lewerentz.Metrics Based Refactoring. In Proceedings ofthe IEEE 5th European Conference on Soft-ware Maintenance and Reengineering, pages30–38, Lisbon, 2001.

[TK03] L. Tahvildari and K. Kontogiannis. A Metric-Based Approach to Enhance Design QualityThrough Meta-Pattern Transformation. In 7thEuropean Conference for Software Mainte-nance and Re-engineering, pages 183–192, Ben-evento, March 2003.

[TM03] T. Tourwe and T. Mens. Identifying Refactor-ing Opportunities Using Logic Meta Program-ming. In 7th European Conference for Soft-ware Maintenance and Re-engineering, pages91–100, Benevento, March 2003.

[Tog02] Together, 2002. visited June 2002,http://www.togethersoft.com.

50

http://typex.dev.java.net

http://www.sdmetrics.com

http://www.togethersoft.com


[VEM02] E. Van-Emden and L. Moonen. Java QualityAssurance by Detecting Code Smells. In 9thWorking Conference on Reverse Engineering,page 97, Richmond, October/November 2002.IEEE Computer.

51

A DEFINITIONS OF METRICS IMPLEMENTED INTO ECLIPSE METRICFRAMEWORK

A Definitions of Metrics Implemented into Eclipse

Metric Framework

Chidamber and Kemerer made one of the first definitions ofO-O metrics. The defined six metrics that cover the mostfundamental design parts of a system: cohesion, complexity,coupling, depth of inheritance, number of class siblings andthe response for a class [CK94]. These categories are specificto a particular characteristic of a system design and do notnecessarily encapsulate all bad smell characteristics that canbe measured.

The identified O-O metrics to measure the characteristicsof bad smells will be described at the abstraction level atwhich they are calculated. There are four levels of abstrac-tion in a Java system: Project, Package, Class and Method.Most of the calculations are at the Class level and there arenone at the Project and Package levels. Only metrics at theClass or Method level are used in the rest of the thesis aredefined here.

Each OO metric will be described in detail the unique at-tribute measurement criteria, where it originated from andwhy it has been chosen. Also each metric will be critiquedon how successful the measure is and if there exists any am-biguities that may pose a threat to the validity of the resultsproduced.

A.1 Class Level

Metrics at the Class level measure the interactions betweenclasses or the attributes of a class.

A.1.1 Coupling Between Objects (CBO)

For each class the count of the number of other classes towhich it is coupled [CK94]. Two classes are coupled whenmethods declared in one class use methods or instance vari-ables defined by the other class.

52

A.1 Class LevelA DEFINITIONS OF METRICS IMPLEMENTED INTO ECLIPSE METRIC

FRAMEWORK

A.1.2 Coupling Between Data Classes 1 (CBDC1)

The number of unique classes a class is coupled to whichare considered to be a Data Class [Mun04]. CBDC1 is arefinement of the CBO metric as it only takes the subsectionof coupled classes that are a Data Class.

A.1.3 Coupling Between Data Classes 2 (CBDC2)

The frequency of classes a class is coupled to which are con-sidered to be a Data Class [Mun04]. In this instance twoclasses are coupled when methods declared in one class useaccessor/ constructor methods or instance variables definedby the other class that is defined to be a Data Class. The dif-ference between CBDC1 and CBDC2 is that the latter countsthe number of coupled Data Classes to a class where as theformer just considers the number of unique Data Classes.

A.1.4 Commented Lines Of Code (CLOC)

The number of source-code lines within a class that are com-mented.

A.1.5 Non-Commented Lines Of Code (NCLOC)

The number of source-code lines in a class that are not blank,commented or purely contain a curly bracket.

A.1.6 Comment Density (CD)

Fenton [FP96] defines the Comment Density (CD) metric tocalculate the density of Commented Lines Of Code (CLOC)at any level of a systems design, the focus here is at the Classlevel.

CD =CLOC

NCLOC.

A.1.7 Depth of Inheritance Tree (DIT)

The depth of a node in a tree refers to the length of themaximal path from the node to the root of the tree [CK94].

53


FRAMEWORK

DIT = depth of the class in the inheritance tree.

A.1.8 Instance Variable Method Count (IVMC)

The number of methods an instance variable of a class isinstantiated, modified or used within.

A.1.9 Lines Of Code (LOC)

The total lines of source-code in a class can be divided intocommented (CLOC) and non-commented (NCLOC) lines.The definition for LOC is below, that uses CLOC and NCLOCmeasures that are also calculated at the Class level.

LOC = NCLOC + CLOC.

A.1.10 Number Of Accessor Methods (NOAM)

The number of non-inherited accessor methods (ACCM) de-clared in the interface of a class [Mar02].

A.1.11 Number Of Classes (NOC)

The number of classes defined within a class, a value greaterthan one signifies inner classes are present.

A.1.12 Number Of Class Constructors (NOCC)

Calculates the number of constructors defined in a class. Thenumber of constructors signify different states the class canbe initialised to.

A.1.13 Number Of External Methods With Parameter List the Sameas Instance Variable Types (NOEMWPLSIVT)

A count of the number of methods defined in a system thatare not in the current class (known as external methods),which have parameter list types that are the same as theinstance variable types within the current class. Inheritedinstance variables and methods are not considered as part ofthe calculation.

54


FRAMEWORK

A.1.14 Number Of Internal Methods With Parameter List the Same asInstance Variable Types (NOIMWPLSIVT)

A count of the number of methods defined in the currentclass (known as internal methods) that have parameter listtypes that are the same as the instance variable types withinthe current class. Inherited instance variables and methodsare not considered as part of the calculation.

A.1.15 Number Of Instance Variables (NOIV)

The number of instance variables defined within a class. Thesecan be public, private, protected or even where no type is de-fined. Inheritance instance variables are not included as partof the calculation.

A.1.16 Number Of Methods (NOM)

Counts all methods defined within a class where by definitionconstructors are considered to be a method in Java. Theseare methods defined in the class as public, private, protectedor even with no type defined. Inherited methods are notconsidered.

A.1.17 Number of Methods Added (NMA)

The number of methods defined in a subclass and not in thesuperclass [Lan03].

A.1.18 Number of Methods Extending (NME)

The number of refined methods in a subclass that invoke thesame named method from the superclass [Lan03].

A.1.19 Number of Methods Overriding (NMO)

The number of methods defined in a superclass that are over-ridden within a class [Lan03].

55


FRAMEWORK

A.1.20 Tight Class Cohesion (TCC)

Chidamber and Kemerer’s O-O Lack of Cohesion in Meth-ods (LCOM) [CK94] metric is effective at identifying themost non-cohesive classes, but it is not effective at distin-guishing between partially cohesive classes [BK95]. Biemanand Kang proposed a refined definition for a class’s cohesionthat considers the relative number of methods connected byinstance variables where methods are considered to be con-nected when they use at least one common instance variable[BK95].

Tight Class Cohesion (TCC) is the relative number of di-rectly connected methods. Bieman and Kang represent amethod as a set of instance variables that are directly orindirectly used in the method, where this representation isreferred to as being an abstract method [BK95].

TCC(C) =NDC(C)

NP (C).

Where NP(C) is the Number of Pairs of abstracted meth-ods in class C.

NP (C) =N(N − 1)

2, where N is the number of methods.

NDC(C) is the Number of Direct Connectivity betweenmethods. If there exists one or more common instance vari-ables between two method abstractions then the two corre-sponding methods are directly connected.

A.1.21 Weight Of Class (WOC)

Defined by Marinescu [Mar02] as the number of non-accessormethods in a class divided by the total number of membersof the interface where inherited members are not counted.In this metric definition by Marinescu it is unclear whetherclass constructors are included as part of the calculation. Bydefinition a Java constructor is a method, hence will be con-sidered as the number of methods in a class.

The closer a class’s WOC metric value is to one, the feweraccessor-methods are present. The closer to zero the value

56


FRAMEWORK

becomes the more accessor-methods are present. A value ofzero represents a class having accessor-methods and nothingelse.

WOC =NOM −NOAM

NOM.

A.1.22 Weight Of Class - 2 (WOC2)

The purpose of the WOC metric is to measure the ratio ofnon-accessor methods to the total number of methods de-fined within a class. It is assumed that Marinescu’s defini-tion of WOC includes constructor methods as part of thetotal number of methods in a class. However the purpose ofclasses constructor methods is a means to setting the instancevariables and giving the class a state. Therefore constructormethods can be considered not to add any functionality (orweight) to a class and should be excluded as part of thismeasure.

Munro [Mun04] redefines Marinescu’s WOC metric to WOC2as not including constructor methods as part of the calcula-tion, where the new definition produces the following for-mula:

WOC2 =(NOM −NOAM −NOCC)

(NOM −NOCC).

A class with a WOC2 result of zero can be interpretedto mean either it has no methods, only constructor-methods,only accessor-methods, or only accessor and constructor meth-ods. Munro [Mun04] identifies in all case, a class with azero WOC2 value has the characteristics for being a poten-tial Data Class bad smell.

A.1.23 Weighted Method Count (WMC)

The complexity of a class can be measured using Chidamberand Kemerer’s adaptation of McCabe’s Cyclomatic Complex-ity for an O-O system. WMC sums the static complexity ofall methods in a class [CK94].

57

A.2 Method LevelA DEFINITIONS OF METRICS IMPLEMENTED INTO ECLIPSE METRIC

FRAMEWORK

Consider a class C1, with methods Mi, ...,Mn that are de-fined in the class. Let c1, .., cn be the complexity of the meth-ods. Then:

WMC =n∑

i=1ci.

If all method complexities are considered to be unity, thenWMC = n, the number of methods [CK94].

A.2 Method Level

Metrics measured at the method level considers the method’ssignature and the source-code structure body of the method.

A.2.1 Accessor Method (ACCM)

An object’s state is held by the current values of its instancevariables. Instance variables of a class declared as privateattributes restrict how they are manipulated externally ei-ther through accessor methods or indirectly within anothermethod body.

There currently exists no formal structure definition foran accessor method. However, Fowler [Fow99] gives an infor-mal description of an accessor method as simply changing orreturning the whole state of a class’s instance variable.

Marinescu uses the following pattern to identify accessormethods; they are small, with unitary Cyclomatic complex-ity, and rely on the name convention, stating that the namesof access methods are prefixed with get (or Get) and muta-tor methods with set (or Set) prefix [Mar02].

Also there exists Java coding standards that give guidanceon a naming convention to start an access method with “get”and a mutator method with “set” and end with the name ofthe corresponding instance variable. However, these are onlyguidelines and are not strictly enforced by a Java compilerand so are open to deviation from this notion.

However, part of Marinescu’s definition of an accessor methodrelies upon a naming convention that can be considered unre-liable for locating true-positive instances, as developers are

58


FRAMEWORK

IF ((A public method has a single LOC) && (Not a constructor)) THEN

IF (Method body contains a return statement) THEN

IF (the whole state of an instance variable is returned) THEN

‘method is an accessor method’ (access)

ElSE IF (Method body is an expression statement) THEN

IF ((one parameter) && (whole state of an instance variable is set to parameter)) THEN

‘method is an accessor method’ (mutator)

ELSE IF ((no parameters) && (statement sets whole state of an instance variable to null)) THEN

‘method is an accessor method’ (mutator)

Figure 5: Accessor Method Pseudo-code.

free to define method names as they please. The methodbody is considered more important as it gives an indicationof what the method strives to achieve.

The first part of Marinescu’s accessor method definitionstates that a method body is small with a single Cyclomaticcomplexity value. This could mean a method body that con-tains no conditional statements and say three LOC settingclass instance variables would be considered a positive candi-date. This example deviates from Fowler’s definition as themethod’s body is doing more than setting a single instancevariable. Using Fowler’s definition would mean an accessormethod’s body to contain only a single line of source-code tofulfil the requirements.

Constructor methods in a class initialise or change thestate of the classes instance variables, which could be con-sidered a mutator method when there is a single LOC in themethod body. Constructor methods are a required part of aclass design for creating and changing the state of an objectand hence will not be considered as an accessor method.

In order to extend Fowler’s accessor method definition fora general Java system, a manual inspection of a number ofJava classes for accessor methods, was carried out to collatepossible differences in their structure. Analysing these re-sults identified a general structure pattern that an accessormethod can take. Figure 5 identifies these possible charac-teristics of an accessor method using pseudo code.

The definition of an accessor method identified in Figure 5requires the whole state of an instance variable to be eitherchanged or returned. In some cases only part of an instancevariable may be changed or returned, for example a mutator

59


FRAMEWORK

method may insert an element into an instance variable array.It was found that true candidate mutator methods either

had one parameter that was used for setting the new stateof an instance variable, or no parameters in the method ar-gument list as the new state of an instance variable was setto null. This is shown that a mutator method has two excep-tions for it to be considered a true candidate.

This metric only calculates the value for a given class, anddoes not consider inner or super classes. This metric returnseither a one or zero integer value that corresponds to a givenmethod being an accessor method or not, respectively. Thereason why the results are as clear-cut for this metric is thatan accessor method is considered to be well defined and mea-suring and locating this information in source-code is eithertrue or false with no in-between.

A.2.2 Commented Lines Of Code in a Method (CLOCM)

The number of commented lines of source-code within a method.Comments that are lines that are between methods are notcounted.

A.2.3 Non-Commented Lines Of Code in a Method (NCLOCM)

The number of lines of source-code in a method that are notblank, commented, or solely contain a curly bracket.

A.2.4 Comment Density - Method level (CDm)

This metric is similar to the CD metric defined above exceptthat it is at the method level rather than the class the level.

CDm =CLOCM

NCLOCM.

A.2.5 Lines Of Code in a Method (LOCM)

The total number of lines of source-code within a method.

LOCM = CLOCM + NCLOCM.

60


FRAMEWORK

A.2.6 McCabe Cyclomatic Complexity (V(G))

The cyclomatic number V(G) of a graph G with n verticesand e edges [McC76];

V (G) = e− n + 2.

In a strongly connected graph G, the cyclomatic numberis equal to the maximum number of linearly independentcircuits.

A.2.7 Number Of Cases within Switch Statements (NOCSS)

The number of Case conditions for all Switch Statementsdefined within a method.

A.2.8 Number Of Parameters (NOP)

The number of parameters a method has in it parameter list.

A.2.9 Number Of Parameters Not Referred (NOPNR)

The number of parameters from a method’s parameter listthat are not referenced to in the method’s body. The calcu-lation comprises of the number of parameters in a methodargument (NOP) minus the number of parameters that areused in the method body (NOPU).

NOPNR = NOP −NOPU.

A.2.10 Number Of Parameters Used (NOPU)

The number of parameters in a methods parameter list thatare referenced in the body of the method.

A.2.11 Number Of Switch Statements (NOSS)

The number of Switch statements in a method.

61

B ANALYSIS WITH DEVELOPER OF TYPEX FOR A DATA-CLASS

B Analysis with Developer of TypEx for a Data-

Class

The data-classes identified by the developer of the system arecompared against the metric threshold results, where incon-sistencies are reasoned below;

The format of this information, starts with the full classname including its package location, followed by the resultsof each metric applied to the class (the metrics used will bein the following order, NOC, NOCC, NOAM, WOC, WOC2,LOC, data-class)

FilterTypes.Bleb.TV.channel, 1.00, 0.00, 0.00, 1.00, 1.00, 4.00, no

Contains a ’toHTLM()’ method that prints out the publicinstance ’programme’ array variable. The method body is asimple for loop, that traverses the whole the array asking,printing the contains as it goes. The purpose of the methodseems to print out the current state of an object of this type,more commonly known as an inherited method from the Ob-ject class, ’toString’.

FilterTypes.Bleb.TV.programme, 1.00, 0.00, 0.00, 1.00, 1.00, 12.00, no

This class definition has three instance variables and a sin-gle method. The method is named ’toHTML’, that has somefunctionality regarding different printing procedures depend-ing upon the contents of the instance variables.

FilterTypes.RSS.item, 1.00, 0.00, 0.00, 1.00, 1.00, 32.00, no Theclass extends one class and implements an interface, has noinstance variables, two methods with corresponding imple-mentation body.

The naming of the two methods have the same naming con-vention as default inherited from Object superclass, ’equals(Objectarg)’ and ’hashCode()’.

FilterTypes.RSS.less.item, 1.00, 0.00, 0.00, 1.00, 1.00, 11.00, no Ex-tends one class and implements an interface, no instance vari-

62


ables, and two methods with full implementation body.The method names are same as inherited from Object su-

perclass, ’equals(Object arg)’ and ’hashCode()’.

FilterTypes.RSS.TestItem, 1.00, 0.00, 0.00, 1.00, 1.00, 11.00, no Onlyhas one method named ’main’, it tests one of the classes con-tained within the same package.

FilterTypes.XBEL.folder, 1.00, 0.00, 0.00, 1.00, 1.00, 8.00, no Threeinstance variables, no constructors, and two methods. Bothof the methods reference independently two of the instancevariables, with a for loop to traverse the whole array struc-ture. The method names correspond to the same as the in-stance variables, with additional ’count’ at the beginning.

uk.ac.strath.cis.snaqueInternal.BooleanTypeNode, 1.00, 1.00, 1.00, 0.75,

0.67, 0.00, no uk.ac.strath.cis.snaqueInternal.DOMTypeNode,1.00, 1.00, 1.00, 0.75, 0.67, 4.00, no

uk.ac.strath.cis.snaqueInternal.DoubleTypeNode, 1.00, 1.00,1.00, 0.75, 0.67, 4.00, no

uk.ac.strath.cis.snaqueInternal.IntTypeNode, 1.00, 1.00, 1.00,0.75, 0.67, 4.00, no

uk.ac.strath.cis.snaqueInternal.StringTypeNode, 1.00, 1.00,1.00, 0.75, 0.67, 4.00, no

uk.ac.strath.cis.snaqueInternal.URLTypeNode, 1.00, 1.00,1.00, 0.75, 0.67, 4.00, no

Implements an interface, one instance variable, one con-structor, one accessor-method, and two other implementedmethods.

The accessor-method is a getter, that returns the wholestate of the instance variable. However the main observationwith the method, is from not following the usual naming con-vention for such an instance. Hence, the metric frameworkonly considers the method body and its computation.

One method returns a boolean value, that never changes.The other implemented method is ’toString’ inherited fromObject, that returns a constant string and not the usual for-

63


mat, printing some or all of current state of the instancevariables.

uk.ac.strath.cis.snaqueInternal.IteratorSkelImpl, 2.00, 0.00, 1.00, 0.75,

0.00, 15.00, no Defined as an abstract class, one inner class,three instance variables and four methods that manipulateinstance variables in various simple ways. The methods havemore functionality than a basic accessor-method.

The inner class defines one instance variable, two construc-tors and five methods that manipulate or return the state ofthe instance variable in various ways.

uk.ac.strath.cis.xstream.demo.app.config, 1.00, 0.00, 0.00, 1.00, 1.00,

2.00, no Five instance variables and one method fully im-plemented. The method is ’toString’ that prints out all thestates of the instance variables, apart from one which is anarray and the length is printed instead.

uk.ac.strath.cis.xstream.Recursive.tests.AllTests, 1.00, 0.00, 0.00, 1.00,

1.00, 4.00, no Implements only one method that is part for thetesting framework JUnit.

uk.ac.strath.cis.xstream.Recursive.tests.AllTestsFromStrings, 1.00, 0.00,

0.00, 1.00, 1.00, 256, no Extends a JUnit interface, there aremany static final instance variables with thirteen implementedmethods.

uk.ac.strath.cis.xstream.Recursive.tests.BookWithAuthors, 1.00, 0.00,

0.00, 1.00, 1.00, 1.00, no uk.ac.strath.cis.xstream.Recursive.tests.BookWithEditors,1.00, 0.00, 0.00, 1.00, 1.00, 1.00, no

Implements an interface, with two instance variables anda single method. The method is ’toString’ that prints outone instance variable and the length of the other since beingan array type.

64

B.1 False PositivesB ANALYSIS WITH DEVELOPER OF TYPEX FOR A DATA-CLASS

B.1 False Positives

The results also identify cases where the developer has notidentified a class to be a possible data-class, but the inter-pretation of the metric results have, these are known as falsepositives. However, the system under investigation is notfully implemented by a single developer. Where as the cur-rent developer who was consulted about the system, has re-used and added new functionality to the system. Consideringthis, there may be a few classes that are obsolete from therunning of the new system and hence the new developer maynot necessarily be familiar with the content of these classes.

The developer is consulted regarding the false-positives asto justify the reasoning why they may been missed to identifythe cases. There are only two classes identified as being falsepositives from the results, their structure is described below;

uk.ac.strath.cis.snaqueInternal.ConfigurationParameters, 1.00, 0.00, -

1.00, -1.00, 0.00, 0.00, yes The class contains three public staticvariables, each declared and initialised without a constructor.

Should be considered to be a data-class.

uk.ac.strath.cis.xstream.Recursive.ProjectionException, 1.00, 3.00, 0.00,

1.00, 0.00, 3.00, yes The class is public final that extends an-other class. There are three constructor methods, each tak-ing different argument parameters, with the method bodiescontaining a single line of code that calls super that passesthe argument brought into the method.

Java does not allow inherited constructors, the class is usedto hold a new type of exception is needed for the catch clauseof an exception handler to handle a specific type of data.

65

C ANALYSIS WITH DEVELOPER OF TYPEX FOR A GOD-CLASS

C Analysis with Developer of TypEx for a God-

Class

The order, which the metric results are represented in theresults are, LOC, WMC, TCC, CBDC1, CBDC2, DIT, DC,god-class.

uk.ac.strath.cis.snaqueInternal.TypeGraph, 186.00, 58.00, 0.13, 0.00,

0.00, 1, 0, yes One public final static instance variable, threeprivate instance variables, one public instance variable, oneconstructor, five private methods, six public methods.

The class does not have any correspondence with data-classes, it is at the top of its hierarchy, a few methods arefull of condition statements, and some are recursive methods.

The metric results can be interpreted the class, has thefourth highest LOC within the system, with the third high-est WMC and a relative low TCC meaning it is not tightlycoupled and could benefit from being split up. Even thoughthe class has no references to data-classes, it could definitelybenefit from being refactored.

Should this class be considered a god-class, it has mostof the required characteristics, mainly lacks not interactingbetween data-classes, it does have two instance variables oftype ’Set’ and ’List’ that hold information and are manipu-lated within some of the methods.

uk.ac.strath.cis.snaqueInternal.TypeGraphBuilder, 167.00, 0.00, 0.00,

0.00, 0.00, 0, 0, no This class implements an inner class, threeprivate static variables, eight private static methods, and onepublic static method, there are no constructors. The in-ner class implements an interface, and has a simple one-linemethod.

Problem is if an inner-class exists, then the metric resultsare of the minor class and not of the main implemented class.This is just a slit hitch with parsing the xml result file. Willfix it soon.

66


uk.ac.strath.cis.xstream.demo.app.RSSWatcher, 76.00, 15.00, 0.00, 0.00,

0.00, 1, 0, no Three default static instance variables, defaultstatic method and two public static methods, there are noconstructor methods.

The characteristics for this class which could be a god-class is that it implements a main method, it is at the topof it’s hierarchy, has the eighth highest LOC in the systemand the class cohesion is non-existent. The reason why co-hesion is the level it is, as it correlates to two methods beinginterconnected by all three static instance variables. How-ever the class does not interact with any data-classes, or hasany instance variables to hold data to be manipulated. Thecomplexity is not really high enough to worry about.

I am more on the side to decline this class as being apossible god-class.

uk.ac.strath.cis.xstream.demo.app2.TVWatcher, 76.00, 13.00, 0.00, 1.00,

2.00, 1, 0, no The class has similar structure to uk.ac.strath.cis.xstream.demo.app.RSSWatcher.The main difference between the two classes are the metricresults. The complexity is lower, but it is coupled to onedata-class, that is used twice.

Again all these metric results seem quite low to becomea god-class, but it does implement a main method and is atthe top of a hierarchy.

uk.ac.strath.cis.xstream.Recursive.AbstractClassTemplate, 42.00, 19.00,

0.00, 0.00, 0.00, 1, 0, no The class is abstract, has no instancevariables, one default static constructor method, five publicfinal static methods, and two public abstract method defini-tions.

The case for this class being a god-class is quite weak, itbeing at the top of its hierarchy. None of the metric resultshighlight outliers, and the class is abstract with two methodsnot having method body implementation.

uk.ac.strath.cis.xstream.Recursive.Generator, 25.00, 5.00, 1.00, 0.00,

0.00, 1, 0, no Public final static class, two private instancevariables, one constructor and public final methods.

67


The class is relatively small, the cohesion is high meaningall methods make use of the instance variables. The class isnot coupled to any data-classes.

Definitely not a god-class.

uk.ac.strath.cis.xstream.Recursive.StAX.ClassTemplate, 47.00, 10.00,

0.47, 0.00, 0.00, 2, 0, no The class is a public final that ex-tends another class, has six private static final instance vari-ables, three private instance variables, one public constructormethod, four private methods, two public methods.

As this class extends another may be the justification forthe low number of LOC. The complexity level does seemfine, the cohesion level is almost right in the middle meaningit is an acceptable level. There are no couplings with data-classes, and the instance variables are not being used to storeand access data.

uk.ac.strath.cis.xstream.Recursive.StAX.MethodTemplate, 335.00, 92.00,

0.00, 0.00, 0.00, 1, 0, yes A final class, three private static finalinstance variables, no constructor, seventeen default staticfinal methods, three private static final methods.

The class is at the top of it’s hierarchy, has the highestLOC value for the system, and the complexity is high. Thecohesion is zero meaning the methods are not inter connectedby instance variables, meaning the classes functionality couldbe easily split up.

uk.ac.strath.cis.xstream.Recursive.StAX.TemplateBase, 223.00, 65.00,

0.75, 0.00, 0.00, 3, 0, no An abstract class that extends a class,two private static final instance variables, two protected in-stance variables, one private instance variable, two construc-tor methods, one public static final method, five protectedfinal methods and two public final methods.

The class has the second highest LOC and WMC values inthe system. However the coupling value is nearer 1, mean-ing the methods use the defined methods well. Interestingthough, the class is of depth three and still manages to havesuch functionality.

68

C.1 False-positivesC ANALYSIS WITH DEVELOPER OF TYPEX FOR A GOD-CLASS

I would say no to being a god-class.

uk.ac.strath.cis.xstream.Recursive.StAX.UnionMethodTemplate, 143,

28, 0, 0, 0, 1, 0, no Final class, no instance variables, no con-structor methods, eight default static methods.

First there are no instance variables, so could not hold andmanipulate data through data-class interaction, hence TCCis zero. The complexity is relative low, and the LOC is high.

But I don’t think it could be a god-class.

C.1 False-positives

uk.ac.strath.cis.xstream.Recursive.tests.AllTestsFromStrings, 256, 42,

0, 14, 32, 3, 0, yes First to mention this class is used for testingpurposes, which the developer may not consider to include asnot being a major part of the overall running of the system.However, the attributes will be considered, as an interpre-tation of the metric results will not be able to distinguishbetween a testing class and another class.

This class extends another, fifteen default static final in-stance variables, no constructors, one default static method,seventeen public methods.

The only instance variables defined in the class are con-stants, which the TCC cohesion measure does not considerin the calculation between methods, hence the justificationfor a zero value. This can mean that the class could be splitup easily.

The class has the second highest LOC, with fourth highestWMC values. However, the class is coupled between a num-ber of data-classes, but is at an inheritance depth of threerelating to the class it extends and Object.

So I think apart from being a test suite, this class has allthe characteristics of being a god-class.

Are there any other false-positives? briefly skimming theother classes metric results, identifies not no other classeswith high enough values to be of any concern, especially theLOC and WMC.

69

D GLOSSARY

D Glossary

D.1 Design Flaw

Design Flaw is the structural characteristic of a design entityor design fragment that expresses a deviation from a givenset of criteria typifying the high quality of a design [Mar02].In other words, a design flaw is the description of a designstructure that breaks some rules of good design. Sometimesthe same name is used for specific instances of a design flawin a system. When the danger of confusion arises, we willuse the term design flaw instance to disambiguate betweenthe two meanings [Dra03].

D.2 Design Heuristic

A statement or information, distilled from the experience ofworking with a software development methodology, whoseapplication improves the quality of the design, is called adesign heuristic [Mar02].

D.3 Design Principle

A design heuristic that expresses an abstract criterion for theevaluation of a design is called a design principle [Mar02].

D.4 Design Rule

A design heuristic that states an imperative or an interdictionconcerning the characteristics of a design is called design rule[Mar02].

D.5 Problem Detection

A specific phase in the reengineering process in which, auto-matically or manually, problem areas in the subject systemare detected [BC98]. Using our previous definition, prob-lem detection is an activity whose goals are the detection ofdesign flaws in the subject system [Dra03].

70

a measurement-based approach for detecting design problems ... · abstract refactoring is a...

Documents