coms w4156: advanced software engineering prof. gail kaiser [email protected] october 18,...
TRANSCRIPT
COMS W4156: Advanced Software Engineering
Prof. Gail Kaiser
[email protected]://ase.cs.columbia.edu/
October 18, 2011 COMS W4156 1
October 18, 2011 COMS W4156 2
Topics covered in this lecture
• Software Quality
• Refactoring
• Verification and Validation
October 18, 2011 COMS W4156 3
Software Quality
October 18, 2011 COMS W4156 4
“Quality” is Hard to Pin Down
• Concise, clear definition is elusive
• Not easily quantifiable
• Many things to many people
• “You'll know it when you see it”
• Often defined as set of “ilities” (attributes)
October 18, 2011 COMS W4156 5
High Quality Software Has…
• Understandability – The ability of a reader of the software to
understand its function – Critical for maintenance
• Modifiability – The ability of the software to be changed by that
reader – Almost defines "maintainability"
October 18, 2011 COMS W4156 6
High Quality Software Has…
• Reliability – The ability of the software to perform as intended
without failure – If it isn't reliable, the maintainer must fix it
• Efficiency – The ability of the software to operate with minimal
use of time and space resources – If it isn't efficient, the maintainer must improve it
October 18, 2011 COMS W4156 7
High Quality Software Has…
• Portability – The ease with which the software can be made
useful in another environment – Porting is usually done by the maintainer
• Testability – The ability of the software to be tested easily – Finding/fixing bugs is part of maintenance – Enhancements/additions must also be tested
October 18, 2011 COMS W4156 8
High Quality Software Has…
• Usability – The ability of the software to be easily used
(human factors) – Not easily used implies more support calls,
enhancements, corrections
Notice all related to maintenance - but these qualities need to be instilled during development
October 18, 2011 COMS W4156 9
Approaches to Achieving Quality
• Continuous refactoring
• Verification and validation
• Software process and process improvement
• Buy rather than build
• Open source software
• …
October 18, 2011 COMS W4156 10
Refactoring
October 18, 2011 COMS W4156 11
What is Refactoring?
• The process of changing the source code of a software system such that:– The external (observable) behavior of the system
does not change - e.g., functional requirements are maintained
– But the internal structure of the system is improved
October 18, 2011 COMS W4156 12
How improved?
• Maintainability!
• Easier to read and understand
• Easier to (further) modify
• Easier to integrate
• Easier to test
October 18, 2011 COMS W4156 13
Simple Example: Consolidate Duplicate Conditional Fragments
• Thisif (isSpecialDeal()) {
total = price * 0.95;
send()
} else {
total = price * 0.98;
send()
}
• Becomes thisif (isSpecialDeal()) {
total = price * 0.95;
} else {
total = price * 0.98;
}
send();
October 18, 2011 COMS W4156 14
Why is it called Refactoring?
• By analogy to the factorization of polynomials• For example,
x2 − x − 2can be factored as
(x + 1)(x − 2)revealing an internal structure that waspreviously not visible (two roots at −1 and +2)
• Similarly, in software refactoring, the change invisible structure can often reveal the "hidden“ internal structure of the original code
October 18, 2011 COMS W4156 15
Refactoring Process• A disciplined technique for restructuring an existing body of
code, altering its internal structure without changing its external behavior
• Series of small behavior-preserving transformations• Each transformation does little, but a sequence of
transformations can produce a significant restructuring• Since each refactoring is small, it's less likely to go wrong• The system is also kept fully working after each small
refactoring (via regression testing), reducing the chances that a system can get seriously broken during the restructuring
October 18, 2011 COMS W4156 16
Example: Extract Method
void printOwing(double amount) {printBanner()//print detailsSystem.out.println(“name: ” +
_name);System.out.println(“amount: ” +
amount);}
void printOwing(double amount) {printBanner()printDetails(amount)}
void printDetails(double amount) {System.out.println(“name: ” + _name);System.out.println(“amount: ” + amount);
}
• You have a code fragment that can be grouped together
• Turn the fragment into a method whose name explains the purpose of the fragment
October 18, 2011 COMS W4156 17
Example: Replace Temp with Query
• You are using a temporary variable to hold the result of an expression• Extract the expression into a method• Replace all references to the temp with the method call• The new method can then be used in other methods
double basePrice = _quantity * _itemPrice
if (basePrice > 1000)
return basePrice * 0.95;
else
return basePrice * 0.98;
if (basePrice() > 1000)
return basePrice() * 0.95;
else
return basePrice() * 0.98;
…
double basePrice() {
return _quantity * _itemPrice;
}
October 18, 2011 COMS W4156 18
Example: Introduce Null Object
• Repeated checks for a null value
• Replace the null value with a null object
if (customer == null) {
name = “occupant”
} else {
name = customer.getName()
}
if (customer == null) {
…
public class nullCustomer
{
public String getName() {
return “occupant”;
…
}
}
customer.getName();
October 18, 2011 COMS W4156 19
Example: Exploit Polymorphism
• Generally, polymorphism is the ability to appear in many forms
• In OO, polymorphism refers to a programming language's ability to process objects differently depending on their data type or class
• More specifically, it is the ability to redefine methods for derived classes (subclasses)
October 18, 2011 COMS W4156 20
Exampledouble getSpeed() {
switch (_type) {case EUROPEAN:
return getBaseSpeed(); case US:
return getBaseSpeed() / 1.6;case BRITISH:
if (getDate() < new Year(1990))return getBaseSpeed() / 1.6else return getBaseSpeed();
} throw new RuntimeException ("Should be unreachable");
}
UK Vehicle
getBaseSpeed()
Vehicle
getBaseSpeed()
EU Vehicle
getBaseSpeed()US Vehicle
getBaseSpeed()
October 18, 2011 COMS W4156 21
Refactoring is Incremental Redesign
• The idea behind refactoring is to acknowledge that it will be difficult to get a design right the first time
• And as a program’s requirements change, the design may need to change– It is notoriously difficult (impossible?) to design for all
possible changes a priori– And as agile programming proponents say, “You aren’t
gonna need it” – but what if later you do?
• Refactoring provides techniques for evolving the design in small incremental steps
October 18, 2011 COMS W4156 22
Refactoring Benefits
• Often code size is reduced after refactoring
• Confusing structures are transformed into simpler structures - which are easier to maintain (and often easier to unit test)
• Promotes a deeper understanding of the code - which aids in finding bugs and anticipating potential bugs
October 18, 2011 COMS W4156 23
Contrast with Performance Optimization
• Again functionality is not changed, only internal structure• However, performance optimizations often involve making
code harder to understand (but faster!)– Use more efficient but more complicated algorithms and data
structures– Lose generality to address specific issues of the implemented solution
• Use profiling tools to determine the 10-20% of the code requiring 80-90% of the CPU cycles – optimize that code, refactor all the other code
October 18, 2011 COMS W4156 24
When to Refactor?
• When you add new functionality
– Do it before you add the new function, to make it easier to add the function
– Or do it after you add the function, to clean up the code including that function
• When you need to fix a bug
• As you do a code review
• Whenever…
October 18, 2011 COMS W4156 25
Why to Refactor?
• General goal is maintainability
• Enhance clarity, understandability, modifiability, integratability, testability
• Very often refactoring is about:– Increasing cohesion– Decreasing coupling
October 18, 2011 COMS W4156 26
Cohesion and Coupling• Cohesion is a property or characteristic of an individual
unit• Coupling is a property of a collection of units• High cohesion GOOD, high coupling BAD• Design for change
– You don't want a change in one unit (component, class, method) to cause errors to ripple throughout your system
– Make units highly cohesive, seek low coupling among units
October 18, 2011 COMS W4156 27
What to Refactor?• Duplicated Code
– Bad because if you modify one instance of duplicated code but not all the others, you (may) have introduced a bug!
• Switch Statements– Often duplicated in code, can typically be replaced
by use of polymorphism (in OO languages)
October 18, 2011 COMS W4156 28
What to Refactor?• Long Method
– More difficult to understand– Performance concerns with respect to lots of short
methods are largely obsolete
• Long Parameter List– Hard to understand, can become inconsistent
• Large Class– Trying to do too much, which reduces cohesion
October 18, 2011 COMS W4156 29
What to Refactor?
• Divergent Change– One type of change requires changing one subset of
methods in the module, another type of change requires changing another subset
• Shotgun Surgery – A change requires lots of little changes in a lot of different
classes
• Parallel Inheritance Hierarchies– Each time you add a subclass to one hierarchy, you need
to do it for all related hierarchies
October 18, 2011 COMS W4156 30
What to Refactor?
• Lazy Class– A class that no longer “pays its way”, e.g., a class
that was downsized by previous refactoring, or represented planned functionality that did not pan out
• Middle Man– If a class is delegating more than half of its
responsibilities to another class, do you really need it?
October 18, 2011 COMS W4156 31
What to Refactor?
• Speculative Generality– “Oh I think we need the ability to do this kind of
thing someday”
• Alternative Classes with Different Interfaces– Two or more methods do the same thing but have
different signature for what they do
October 18, 2011 COMS W4156 32
What to Refactor?
• Primitive Obsession– Characterized by a reluctance to use classes
instead of primitive data types
• Temporary Field– An attribute of an object is only set in certain
circumstances - but an object should need all of its attributes
October 18, 2011 COMS W4156 33
What to Refactor?
• Feature Envy– A method requires lots of information from some
other class
• Data Clumps– Attributes (e.g., method parameters) that clump
together but are not part of the same class
October 18, 2011 COMS W4156 34
What to Refactor?
• Message Chains– A client asks an object for another object and then
asks that object for another object, etc.
getA().getB().getC().getD().getE().doSomething(); – Bad because client depends on the structure of
the navigation • Inappropriate Intimacy
– Pairs of classes that know too much about each other’s private details
October 18, 2011 COMS W4156 35
What to Refactor?
• Data Class– Classes that have fields, getting and setting
methods for the fields, and nothing else– They are data holders, but objects should be
about data and behavior (with some exceptions, e.g., entity beans)
• Refused Bequest– A subclass ignores most of the functionality
provided by its superclass
October 18, 2011 COMS W4156 36
What to Refactor?
• Incomplete Library Class– An infrastructure class doesn’t do everything you
need
• Comments (!)– Comments are sometimes used to “decorate” bad
code
–/* This is a gross hack */
October 18, 2011 COMS W4156 37
But Refactoring can be Dangerous
• If programmers spend time “cleaning up the code”, then that’s less time spent implementing required functionality - and the schedule is slipping as it is!
• Refactoring can break code that previously worked
Refactoring needs to be systematic, incremental, and safe
October 18, 2011 COMS W4156 38
How to Make Refactoring Safe?
• Use refactoring “patterns”– Catalog at http://refactoring.com/catalog/index.html
• Use refactoring tools– E.g., Eclipse JDT supports refactoring
• Test constantly!– Regression testing
October 18, 2011 COMS W4156 39
Regression Testing After Changes Can be unit tests or a combination of unit and
integration tests
• Change is successful, and no new errors are introduced
• Change does not work as intended, and no new errors are introduced
• Change is successful, but at least one new error is introduced
• Change does not work, and at least one new error is introduced
October 18, 2011 COMS W4156 40
Other Difficulties with Refactoring• Some refactorings require that interfaces be changed
– If you own all the calling code, need to change everywhere the interface is used
– If not, the interface is “published” and can’t change (or shouldn’t)
• Business applications are often tightly coupled to underlying database schemas– Virtually impossible to reorganize a database schema unless
the underlying database automates the corresponding table/row/column transformations (or your database is empty)
October 18, 2011 COMS W4156 41
Other Difficulties with Refactoring
• Dealing with hardware devices is worse than databases and other external software interfaces– Software can change, the hardware (usually)
cannot
• Real-time or other timing-dependent applications– Refactored code will not necessarily run within
previous time bounds
October 18, 2011 COMS W4156 42
Summary
• Refactor often
• Refactor as you go
• Simplest version of refactoring: add comments, rename local variables and parameters more intuitively
• Regression test after every refactoring
October 18, 2011 COMS W4156 43
Verification and Validation
October 18, 2011 COMS W4156 44
Quality Assurance:Verification and Validation
• Validation: Are we building the right product?– QA at requirements and design level concentrates
on validation – ensures that the product will actually meet the users’ needs
• Verification: Are we building the product right?– QA at code level concentrates on verification –
ensures that the product has been built according to the requirements and design specifications (only useful if the specifications were correct in the first place)
October 18, 2011 COMS W4156 45
V&V Techniques
• Standards (ISO 9001, SEI CMMI)
• Metrics (six sigma)
• Reviews (inspections, static analysis)
• Testing and model checkingWhole lifecycle process applied at each stage
October 18, 2011 COMS W4156 46
Inspection Overview
• Also known as walkthrough• An approach to testing that does not actually execute
the code• Formal process for reading through the software
product as a group and identifying defects• Potentially applied to all project documents including
but not limited to source code• Used to increase software quality and improve
productivity and manageability of the development process
October 18, 2011 COMS W4156 47
Static Analysis Overview
• Software tools parse the program text and try to discover potentially erroneous conditions
• Control flow analysis: Checks for loops with multiple exit or entry points, finds unreachable code, etc.
• Data use analysis: Detects uninitialized variables, variables written twice without an intervening use, variables that are declared but never used, etc.
• Interface analysis: Checks the consistency of type, method, etc. declarations and their use
• Should occur prior to inspection or testing
October 18, 2011 COMS W4156 48
Why Test?
• No matter how well software has been designed and coded, it will inevitably still contain defects
• Testing is the process of executing a program with the intent of finding faults (bugs)
• A “successful” test is one that finds errors, not one that doesn’t find errors
• Testing can “prove” the presence of faults, but can not “prove” their absence (unless the program is so trivial that it can be exhaustively tested)
• But can increase confidence that a program “works”
October 18, 2011 COMS W4156 49
What to Test?
• Unit test – test of small code unit: start with individual methods, build up to class (and class hierarchy if applicable), then component
• Integration test – test of several units combined to form a (sub)system, preferably adding one unit at a time
• System (alpha) test – test of a system release by “independent” system testers
• Acceptance (beta) test – test of a release by end-users or their representatives
October 18, 2011 COMS W4156 50
When to Test?
Early• “Agile programming” developers write unit test cases before
coding each unit (test-driven development)• Many software processes involve writing system/acceptance
tests in parallel with developmentOften• Regression testing: rerun unit, integration and
system/acceptance tests– After refactoring– Throughout integration– Before each release
October 18, 2011 COMS W4156 51
Who should Test?
• Argument: Software authors should not test their own code because– Testers who don’t believe they will find faults generally
don’t find many faults (cognitive dissonance)– Testers who have to fix any faults they find don’t tend to
find very many (avoidance behavior)– Coders want code to be fault free, but effective testers
must want to find faults (conflict of interest)• However, in practice code authors usually do unit
tests and often integration tests• Separate “independent” team usually does system
tests and/or acceptance tests
October 18, 2011 COMS W4156 52
Defining a Test
• Goal – the aspect of the system being tested• Input – specify the actions and conditions that
lead up to the test as well as the input (state of the world, not just parameters) that actually constitutes the test
• Outcome – specify how the system should respond or what it should compute, according to its requirements
October 18, 2011 COMS W4156 53
Test Harness (Scaffolding)
• test driver - supporting code and data used to provide an environment for invoking part of a system in isolation
• stub - dummy procedure, module or unit that stands in for another portion of a system, intended to be invoked by that isolated part of the system – May consist of nothing more than a function header with no
body– If a stub needs to return values, it may read and return test
data from a file, return hard-coded values, or obtain data from a user (the tester) and return it
– The stub should cover every possible error code and exception that can arise
October 18, 2011 COMS W4156 54
Unit Testing Overview
• Unit testing is testing some program unit in isolation from the rest of the system (which may not exist yet)
• Usually the programmer is responsible for testing a unit during its implementation (even though this violates the rule about a programmer not testing own software)
• Easier to debug when a test finds a bug (compared to full-system testing)
October 18, 2011 COMS W4156 55
Integration Testing Overview
• Motivation: Units that worked in isolate may not work in combination
• Performed after all units to be integrated have passed all unit tests
• Reuse unit test cases that cross unit boundaries (that previously required stub(s) and/or driver standing in for another unit)
October 18, 2011 COMS W4156 56
System/Acceptance Testing Overview
• Full system, from end-user (or other external role) input/output perspective
• Lab testing vs. field testing
• Consider interoperability with customer software and hardware configurations
• Additional factors: security, performance, usability
October 18, 2011 COMS W4156 57
How do you know when you are “done” testing?
• Adequacy criteria (coverage metrics): all statements, all branches, all control flow paths, all data flow paths
• All programmed error messages and exceptions have been produced
• Have reached “tail” of defect density curve
• Confidence established that the software is fit for its purpose, “good enough”
October 18, 2011 COMS W4156 58
Defect Density Curve
Upcoming Assignments
October 18, 2011 COMS W4156 59
Upcoming Assignments
• First Iteration First Progress Report due Thursday 20 October, 10am
• First Iteration Second Progress Report due Thursday 27 October, 10am
• Demo Week November 1-10– Seeking in-class demos for Tue Nov 1 and Thu Nov 3– Sliding scale extra credit for early demos
• First Iteration Final Report due Friday 11 November, 5pm
October 18, 2011 COMS W4156 60
COMS W4156: Advanced Software Engineering
Prof. Gail [email protected]://ase.cs.columbia.edu/
October 18, 2011 COMS W4156 61