coms w4156: advanced software engineering prof. gail kaiser [email protected] october 18,...

COMS W4156: Advanced Software Engineering

Prof. Gail Kaiser

[email protected]://ase.cs.columbia.edu/

October 18, 2011 COMS W4156 1


Topics covered in this lecture

• Software Quality

• Refactoring

• Verification and Validation


Software Quality


“Quality” is Hard to Pin Down

• Concise, clear definition is elusive

• Not easily quantifiable

• Many things to many people

• “You'll know it when you see it”

• Often defined as set of “ilities” (attributes)


High Quality Software Has…

• Understandability – The ability of a reader of the software to

understand its function – Critical for maintenance

• Modifiability – The ability of the software to be changed by that

reader – Almost defines "maintainability"



• Reliability – The ability of the software to perform as intended

without failure – If it isn't reliable, the maintainer must fix it

• Efficiency – The ability of the software to operate with minimal

use of time and space resources – If it isn't efficient, the maintainer must improve it



• Portability – The ease with which the software can be made

useful in another environment – Porting is usually done by the maintainer

• Testability – The ability of the software to be tested easily – Finding/fixing bugs is part of maintenance – Enhancements/additions must also be tested



• Usability – The ability of the software to be easily used

(human factors) – Not easily used implies more support calls,

enhancements, corrections

Notice all related to maintenance - but these qualities need to be instilled during development


Approaches to Achieving Quality

• Continuous refactoring

• Verification and validation

• Software process and process improvement

• Buy rather than build

• Open source software

• …


Refactoring


What is Refactoring?

• The process of changing the source code of a software system such that:– The external (observable) behavior of the system

does not change - e.g., functional requirements are maintained

– But the internal structure of the system is improved


How improved?

• Maintainability!

• Easier to read and understand

• Easier to (further) modify

• Easier to integrate

• Easier to test


Simple Example: Consolidate Duplicate Conditional Fragments

• Thisif (isSpecialDeal()) {

total = price * 0.95;

send()

} else {


send()

}

• Becomes thisif (isSpecialDeal()) {


} else {


}

send();


Why is it called Refactoring?

• By analogy to the factorization of polynomials• For example,

x2 − x − 2can be factored as

(x + 1)(x − 2)revealing an internal structure that waspreviously not visible (two roots at −1 and +2)

• Similarly, in software refactoring, the change invisible structure can often reveal the "hidden“ internal structure of the original code


Refactoring Process• A disciplined technique for restructuring an existing body of

code, altering its internal structure without changing its external behavior

• Series of small behavior-preserving transformations• Each transformation does little, but a sequence of

transformations can produce a significant restructuring• Since each refactoring is small, it's less likely to go wrong• The system is also kept fully working after each small

refactoring (via regression testing), reducing the chances that a system can get seriously broken during the restructuring


Example: Extract Method

void printOwing(double amount) {printBanner()//print detailsSystem.out.println(“name: ” +

_name);System.out.println(“amount: ” +

amount);}

void printOwing(double amount) {printBanner()printDetails(amount)}

void printDetails(double amount) {System.out.println(“name: ” + _name);System.out.println(“amount: ” + amount);

}

• You have a code fragment that can be grouped together

• Turn the fragment into a method whose name explains the purpose of the fragment


Example: Replace Temp with Query

• You are using a temporary variable to hold the result of an expression• Extract the expression into a method• Replace all references to the temp with the method call• The new method can then be used in other methods

double basePrice = _quantity * _itemPrice

if (basePrice > 1000)

return basePrice * 0.95;

else

return basePrice * 0.98;

if (basePrice() > 1000)

return basePrice() * 0.95;

else

return basePrice() * 0.98;

…

double basePrice() {

return _quantity * _itemPrice;

}


Example: Introduce Null Object

• Repeated checks for a null value

• Replace the null value with a null object

if (customer == null) {

name = “occupant”

} else {

name = customer.getName()

}

if (customer == null) {

…

public class nullCustomer

{

public String getName() {

return “occupant”;

…

}

}

customer.getName();


Example: Exploit Polymorphism

• Generally, polymorphism is the ability to appear in many forms

• In OO, polymorphism refers to a programming language's ability to process objects differently depending on their data type or class

• More specifically, it is the ability to redefine methods for derived classes (subclasses)


Exampledouble getSpeed() {

switch (_type) {case EUROPEAN:

return getBaseSpeed(); case US:

return getBaseSpeed() / 1.6;case BRITISH:

if (getDate() < new Year(1990))return getBaseSpeed() / 1.6else return getBaseSpeed();

} throw new RuntimeException ("Should be unreachable");

}

UK Vehicle

getBaseSpeed()

Vehicle

getBaseSpeed()

EU Vehicle

getBaseSpeed()US Vehicle

getBaseSpeed()


Refactoring is Incremental Redesign

• The idea behind refactoring is to acknowledge that it will be difficult to get a design right the first time

• And as a program’s requirements change, the design may need to change– It is notoriously difficult (impossible?) to design for all

possible changes a priori– And as agile programming proponents say, “You aren’t

gonna need it” – but what if later you do?

• Refactoring provides techniques for evolving the design in small incremental steps


Refactoring Benefits

• Often code size is reduced after refactoring

• Confusing structures are transformed into simpler structures - which are easier to maintain (and often easier to unit test)

• Promotes a deeper understanding of the code - which aids in finding bugs and anticipating potential bugs


Contrast with Performance Optimization

• Again functionality is not changed, only internal structure• However, performance optimizations often involve making

code harder to understand (but faster!)– Use more efficient but more complicated algorithms and data

structures– Lose generality to address specific issues of the implemented solution

• Use profiling tools to determine the 10-20% of the code requiring 80-90% of the CPU cycles – optimize that code, refactor all the other code


When to Refactor?

• When you add new functionality

– Do it before you add the new function, to make it easier to add the function

– Or do it after you add the function, to clean up the code including that function

• When you need to fix a bug

• As you do a code review

• Whenever…


Why to Refactor?

• General goal is maintainability

• Enhance clarity, understandability, modifiability, integratability, testability

• Very often refactoring is about:– Increasing cohesion– Decreasing coupling


Cohesion and Coupling• Cohesion is a property or characteristic of an individual

unit• Coupling is a property of a collection of units• High cohesion GOOD, high coupling BAD• Design for change

– You don't want a change in one unit (component, class, method) to cause errors to ripple throughout your system

– Make units highly cohesive, seek low coupling among units


What to Refactor?• Duplicated Code

– Bad because if you modify one instance of duplicated code but not all the others, you (may) have introduced a bug!

• Switch Statements– Often duplicated in code, can typically be replaced

by use of polymorphism (in OO languages)


What to Refactor?• Long Method

– More difficult to understand– Performance concerns with respect to lots of short

methods are largely obsolete

• Long Parameter List– Hard to understand, can become inconsistent

• Large Class– Trying to do too much, which reduces cohesion


What to Refactor?

• Divergent Change– One type of change requires changing one subset of

methods in the module, another type of change requires changing another subset

• Shotgun Surgery – A change requires lots of little changes in a lot of different

classes

• Parallel Inheritance Hierarchies– Each time you add a subclass to one hierarchy, you need

to do it for all related hierarchies


What to Refactor?

• Lazy Class– A class that no longer “pays its way”, e.g., a class

that was downsized by previous refactoring, or represented planned functionality that did not pan out

• Middle Man– If a class is delegating more than half of its

responsibilities to another class, do you really need it?


What to Refactor?

• Speculative Generality– “Oh I think we need the ability to do this kind of

thing someday”

• Alternative Classes with Different Interfaces– Two or more methods do the same thing but have

different signature for what they do


What to Refactor?

• Primitive Obsession– Characterized by a reluctance to use classes

instead of primitive data types

• Temporary Field– An attribute of an object is only set in certain

circumstances - but an object should need all of its attributes


What to Refactor?

• Feature Envy– A method requires lots of information from some

other class

• Data Clumps– Attributes (e.g., method parameters) that clump

together but are not part of the same class


What to Refactor?

• Message Chains– A client asks an object for another object and then

asks that object for another object, etc.

getA().getB().getC().getD().getE().doSomething(); – Bad because client depends on the structure of

the navigation • Inappropriate Intimacy

– Pairs of classes that know too much about each other’s private details


What to Refactor?

• Data Class– Classes that have fields, getting and setting

methods for the fields, and nothing else– They are data holders, but objects should be

about data and behavior (with some exceptions, e.g., entity beans)

• Refused Bequest– A subclass ignores most of the functionality

provided by its superclass


What to Refactor?

• Incomplete Library Class– An infrastructure class doesn’t do everything you

need

• Comments (!)– Comments are sometimes used to “decorate” bad

code

–/* This is a gross hack */


But Refactoring can be Dangerous

• If programmers spend time “cleaning up the code”, then that’s less time spent implementing required functionality - and the schedule is slipping as it is!

• Refactoring can break code that previously worked

Refactoring needs to be systematic, incremental, and safe


How to Make Refactoring Safe?

• Use refactoring “patterns”– Catalog at http://refactoring.com/catalog/index.html

• Use refactoring tools– E.g., Eclipse JDT supports refactoring

• Test constantly!– Regression testing


Regression Testing After Changes Can be unit tests or a combination of unit and

integration tests

• Change is successful, and no new errors are introduced

• Change does not work as intended, and no new errors are introduced

• Change is successful, but at least one new error is introduced

• Change does not work, and at least one new error is introduced


Other Difficulties with Refactoring• Some refactorings require that interfaces be changed

– If you own all the calling code, need to change everywhere the interface is used

– If not, the interface is “published” and can’t change (or shouldn’t)

• Business applications are often tightly coupled to underlying database schemas– Virtually impossible to reorganize a database schema unless

the underlying database automates the corresponding table/row/column transformations (or your database is empty)


Other Difficulties with Refactoring

• Dealing with hardware devices is worse than databases and other external software interfaces– Software can change, the hardware (usually)

cannot

• Real-time or other timing-dependent applications– Refactored code will not necessarily run within

previous time bounds


Summary

• Refactor often

• Refactor as you go

• Simplest version of refactoring: add comments, rename local variables and parameters more intuitively

• Regression test after every refactoring


Verification and Validation


Quality Assurance:Verification and Validation

• Validation: Are we building the right product?– QA at requirements and design level concentrates

on validation – ensures that the product will actually meet the users’ needs

• Verification: Are we building the product right?– QA at code level concentrates on verification –

ensures that the product has been built according to the requirements and design specifications (only useful if the specifications were correct in the first place)


V&V Techniques

• Standards (ISO 9001, SEI CMMI)

• Metrics (six sigma)

• Reviews (inspections, static analysis)

• Testing and model checkingWhole lifecycle process applied at each stage


Inspection Overview

• Also known as walkthrough• An approach to testing that does not actually execute

the code• Formal process for reading through the software

product as a group and identifying defects• Potentially applied to all project documents including

but not limited to source code• Used to increase software quality and improve

productivity and manageability of the development process


Static Analysis Overview

• Software tools parse the program text and try to discover potentially erroneous conditions

• Control flow analysis: Checks for loops with multiple exit or entry points, finds unreachable code, etc.

• Data use analysis: Detects uninitialized variables, variables written twice without an intervening use, variables that are declared but never used, etc.

• Interface analysis: Checks the consistency of type, method, etc. declarations and their use

• Should occur prior to inspection or testing


Why Test?

• No matter how well software has been designed and coded, it will inevitably still contain defects

• Testing is the process of executing a program with the intent of finding faults (bugs)

• A “successful” test is one that finds errors, not one that doesn’t find errors

• Testing can “prove” the presence of faults, but can not “prove” their absence (unless the program is so trivial that it can be exhaustively tested)

• But can increase confidence that a program “works”


What to Test?

• Unit test – test of small code unit: start with individual methods, build up to class (and class hierarchy if applicable), then component

• Integration test – test of several units combined to form a (sub)system, preferably adding one unit at a time

• System (alpha) test – test of a system release by “independent” system testers

• Acceptance (beta) test – test of a release by end-users or their representatives


When to Test?

Early• “Agile programming” developers write unit test cases before

coding each unit (test-driven development)• Many software processes involve writing system/acceptance

tests in parallel with developmentOften• Regression testing: rerun unit, integration and

system/acceptance tests– After refactoring– Throughout integration– Before each release


Who should Test?

• Argument: Software authors should not test their own code because– Testers who don’t believe they will find faults generally

don’t find many faults (cognitive dissonance)– Testers who have to fix any faults they find don’t tend to

find very many (avoidance behavior)– Coders want code to be fault free, but effective testers

must want to find faults (conflict of interest)• However, in practice code authors usually do unit

tests and often integration tests• Separate “independent” team usually does system

tests and/or acceptance tests


Defining a Test

• Goal – the aspect of the system being tested• Input – specify the actions and conditions that

lead up to the test as well as the input (state of the world, not just parameters) that actually constitutes the test

• Outcome – specify how the system should respond or what it should compute, according to its requirements


Test Harness (Scaffolding)

• test driver - supporting code and data used to provide an environment for invoking part of a system in isolation

• stub - dummy procedure, module or unit that stands in for another portion of a system, intended to be invoked by that isolated part of the system – May consist of nothing more than a function header with no

body– If a stub needs to return values, it may read and return test

data from a file, return hard-coded values, or obtain data from a user (the tester) and return it

– The stub should cover every possible error code and exception that can arise


Unit Testing Overview

• Unit testing is testing some program unit in isolation from the rest of the system (which may not exist yet)

• Usually the programmer is responsible for testing a unit during its implementation (even though this violates the rule about a programmer not testing own software)

• Easier to debug when a test finds a bug (compared to full-system testing)


Integration Testing Overview

• Motivation: Units that worked in isolate may not work in combination

• Performed after all units to be integrated have passed all unit tests

• Reuse unit test cases that cross unit boundaries (that previously required stub(s) and/or driver standing in for another unit)


System/Acceptance Testing Overview

• Full system, from end-user (or other external role) input/output perspective

• Lab testing vs. field testing

• Consider interoperability with customer software and hardware configurations

• Additional factors: security, performance, usability


How do you know when you are “done” testing?

• Adequacy criteria (coverage metrics): all statements, all branches, all control flow paths, all data flow paths

• All programmed error messages and exceptions have been produced

• Have reached “tail” of defect density curve

• Confidence established that the software is fit for its purpose, “good enough”


Defect Density Curve

Upcoming Assignments


Upcoming Assignments

• First Iteration First Progress Report due Thursday 20 October, 10am

• First Iteration Second Progress Report due Thursday 27 October, 10am

• Demo Week November 1-10– Seeking in-class demos for Tue Nov 1 and Thu Nov 3– Sliding scale extra credit for early demos

• First Iteration Final Report due Friday 11 November, 5pm


COMS W4156: Advanced Software Engineering

Prof. Gail [email protected]://ase.cs.columbia.edu/


coms w4156: advanced software engineering prof. gail kaiser [email protected] october 18,...

Documents

software refactoring

coms w41561 slide

software system

coms w415610 refactoring

open source software

coms w415612

coms w415615

coms w415611