1 software engineering ii software reliability. 2 dependable and reliable systems: the royal majesty...

48
1 Software Engineering II Software Reliability

Upload: sara-barber

Post on 29-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

1

Software Engineering II

Software Reliability

Page 2: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

2

Dependable and Reliable Systems: The Royal Majesty

From the report of the National Transportation Safety Board:

"On June 10, 1995, the Panamanian passenger ship Royal Majesty grounded on Rose and Crown Shoal about 10 miles east of Nantucket Island, Massachusetts, and about 17 miles from where the watch officers thought the vessel was. The vessel, with 1,509 persons on board, was en route from St. George’s, Bermuda, to Boston, Massachusetts."

"The Raytheon GPS unit installed on the Royal Majesty had been designed as a standalone navigation device in the mid- to late1980s, ...The Royal Majesty’s GPS was configured by Majesty Cruise Line to automatically default to the Dead Reckoning mode when satellite data were not available."

Page 3: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

3

The Royal Majesty: Analysis

• The ship was steered by an autopilot that relied on position information from the Global Positioning System (GPS).

• If the GPS could not obtain a position from satellites, it provided an estimated position based on Dead Reckoning (distance and direction traveled from a known point).

• The GPS failed one hour after leaving Bermuda.

• The crew failed to see the warning message on the display (or to check the instruments).

• 34 hours and 600 miles later, the Dead Reckoning error was 17 miles.

Page 4: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

4

The Royal Majesty: Software Lessons

All the software worked as specified (no bugs), but ...

• Since the GPS software had been specified, the requirements had changed (stand alone system to part of integrated system).

• The manufacturers of the autopilot and GPS adopted different design philosophies about the communication of mode changes.

• The autopilot was not programmed to recognize valid/invalid status bits in message from the GPS (NMEA 0183).

• The warnings provided by the user interface were not sufficiently conspicuous to alert the crew.

• The officers had not been properly trained on this equipment.

Page 5: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

5

Reliability

Reliability: Probability of a failure occurring in operational use.

Perceived reliability: Depends upon:

user behavior set of inputs pain of failure

Page 6: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

6

User Perception of Reliability

1. A personal computer that crashes frequently v. a machine that is out of service for two days.

2. A database system that crashes frequently but comes back quickly with no loss of data v. a system that fails once in three years but data has to be restored from backup.

3. A system that does not fail but has unpredictable periods when it runs very slowly.

Page 7: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

7

Reliability Metrics

Traditional Measures• Mean time between failures• Availability (up time)• Mean time to repair

Market Measures• Complaints• Customer retention

User Perception is Influenced by• Distribution of failures

Hypothetical example: Cars are less safe than airplanes in accidents per hour, but safer in accidents per mile.

Page 8: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

8

Reliability Metrics for Distributed Systems

Traditional metrics are hard to apply in multi-component systems:

• In a big network, at any given moment something will be giving trouble, but very few users will see it.

• A system that has excellent average reliability may give terrible service to certain users.

• There are so many components that system administrators rely on automatic reporting systems to identify problem areas.

Page 9: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

9

Requirements Specification of System Reliability

Example: ATM card reader

Failure class Example Metric

Permanent System fails to operate 1 per 1,000 daysnon-corrupting with any card -- reboot

Transient System can not read 1 in 1,000 transactionsnon-corrupting an undamaged card

Corrupting A pattern of Never transactions corrupts database

Page 10: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

10

Cost of Improved Reliability

$

Up time

99% 100%

Will you spend your money on new functionality or improved reliability?

Page 11: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

11

Example: Central Computing System

A central computer serves the entire organization. Any failure is serious.

Step 1: Gather data on every failure

• 10 years of data in a simple data base

• Every failure analyzed:

hardwaresoftware (default)environment (e.g., power, air conditioning)human (e.g., operator error)

Page 12: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

12

Example: Central Computing System

Step 2: Analyze the data

• Weekly, monthly, and annual statistics

Number of failures and interruptionsMean time to repair

• Graphs of trends by component, e.g.,

Failure rates of disk drivesHardware failures after power failuresCrashes caused by software bugs in each module

Page 13: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

13

Example: Central Computing System

Step 3: Invest resources where benefit will be maximum, e.g.,

• Orderly shut down after power failure

• Priority order for software improvements

• Changed procedures for operators

• Replacement hardware

Page 14: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

14

Building Dependable Systems: Three Principles

For a software system to be dependable:

• Each stage of development must be done well.

• Changes should be incorporated into the structure as carefully as the original system development.

• Testing and correction do not ensure quality, but dependable systems are not possible without systematic testing.

Page 15: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

15

Reliability: Modified Waterfall Model

Requirements

System design

Testing

Operation & maintenance

Program design

Coding

Acceptance

Feasibility study

Changes

Page 16: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

16

Key Factors for Reliable Software

• Organization culture that expects quality

• Approach to software design and implementation that hides complexity (e.g., structured design, object-oriented programming)

• Precise, unambiguous specification

• Use of software tools that restrict or detect errors (e.g., strongly typed languages, source control systems, debuggers)

• Programming style that emphasizes simplicity, readability, and avoidance of dangerous constructs

• Incremental validation

Page 17: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

17

Building Dependable Systems: Organizational Culture

Good organizations create good systems:

• Acceptance of the group's style of work (e.g., meetings, preparation, support for juniors)

• Visibility

• Completion of a task before moving to the next (e.g., documentation, comments in code)

Page 18: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

18

Building Dependable Systems: Complexity

The human mind can encompass only limited complexity:

• Comprehensibility

• Simplicity

• Partitioning of complexity

A simple system or subsystem is easier to get right than a complex one.

Page 19: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

19

Building Dependable Systems: Specifications for the Client

Specifications are of no value if they do not meet the client's needs

• The client must understand and review the requirements specification in detail

• Appropriate members of the client's staff must review relevant areas of the design (e.g., operations, training materials, system administration)

• The acceptance tests must belong to the client

Page 20: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

20

Building Dependable Systems: Quality Management Processes

Assumption:

Good processes lead to good software

The importance of routine:

Standard terminology (requirements, specification, design, etc.)

Software standards (naming conventions, etc.)

Internal and external documentation

Reporting procedures

Page 21: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

21

Building Dependable Systems: Change

Change management:

Source code management and version control

Tracking of change requests and bug reports

Procedures for changing requirements specifications, designs and other documentation

Regression testing

Release control

Page 22: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

22

Reviews: Process (Plan)

Objectives:

• To review progress against plan (formal or informal).

• To adjust plan (schedule, team assignments, functionality, etc.).

Impact on quality:

Good quality systems usually result from plans that are demanding but realistic.

Good people like to be stretched and to work hard, but must not be pressed beyond their capabilities.

Page 23: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

23

Reviews: Design and Code

DESIGN AND CODE REVIEWS ARE A FUNDAMENTAL PART OF GOOD SOFTWARE DEVELOPMENT

Concept

Colleagues review each other's work:

can be applied to any stage of software development

can be formal or informal

Page 24: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

24

Benefits of Design and Code Reviews

Benefits:

• Extra eyes spot mistakes, suggest improvements

• Colleagues share expertise; helps with training

• An occasion to tidy loose ends

• Incompatibilities between components can be identified

• Helps scheduling and management control

Fundamental requirements:

• Senior team members must show leadership

• Good reviews require good preparation

• Everybody must be helpful, not threatening

Page 25: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

25

Review Team (Full Version)

A review is a structured meeting, with the following people

Moderator -- ensures that the meeting moves ahead steadily

Scribe -- records discussion in a constructive manner

Developer -- person(s) whose work is being reviewed

Interested parties -- people above and below in the software process

Outside experts -- knowledgeable people who have are not working on this project

Client -- representatives of the client who are knowledgeable about this part of the process

Page 26: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

26

Example: Program Design

Moderator

Scribe

Developer -- the design team

Interested parties -- people who created the system design and/or requirements specification, and the programmers who will implement the system

Outside experts -- knowledgeable people who have are not working on this project

Client -- only if the client has a strong technical representative

Page 27: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

27

Review Process

Preparation

The developer provides colleagues with documentation (e.g., specification or design), or code listing

Participants study the documentation in advance

Meeting

The developer leads the reviewers through the documentation, describing what each section does and encouraging questions

Must allow plenty of time and be prepared to continue on another day.

Page 28: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

28

Static and Dynamic Verification

Static verification: Techniques of verification that do not include execution of the software.

• May be manual or use computer tools.

Dynamic verification:

• Testing the software with trial data.

• Debugging to remove errors.

Page 29: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

29

Static Validation & Verification

Carried out throughout the software development process.

Validation & verification

Requirements specification Design Program

REVIEWS

Page 30: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

30

Static Verification: Program Inspections

Formal program reviews whose objective is to detect faults

• Code may be read or reviewed line by line.

• 150 to 250 lines of code in 2 hour meeting.

• Use checklist of common errors.

• Requires team commitment, e.g., trained leaders

So effective that it is claimed that it can replace unit testing

Page 31: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

31

Inspection Checklist: Common Errors

Data faults: Initialization, constants, array bounds, character strings

Control faults: Conditions, loop termination, compound statements, case statements

Input/output faults: All inputs used; all outputs assigned a value

Interface faults: Parameter numbers, types, and order; structures and shared memory

Storage management faults: Modification of links, allocation and de-allocation of memory

Exceptions: Possible errors, error handlers

Page 32: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

32

Static Analysis Tools

Program analyzers scan the source of a program for possible faults and anomalies (e.g., Lint for C programs).

• Control flow: loops with multiple exit or entry points

• Data use: Undeclared or uninitialized variables, unused variables, multiple assignments, array bounds

• Interface faults: Parameter mismatches, non-use of functions results, uncalled procedures

• Storage management: Unassigned pointers, pointer arithmetic

Page 33: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

33

Static Analysis Tools (continued)

Static analysis tools

• Cross-reference table: Shows every use of a variable, procedure, object, etc.

• Information flow analysis: Identifies input variables on which an output depends.

• Path analysis: Identifies all possible paths through the program.

Page 34: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

34

Failures and Faults

Failure: Software does not deliver the service expected by the user (e.g., mistake in requirements, confusing user interface)

Fault (BUG): Programming or design error whereby the delivered system does not conform to specification (e.g., coding error, interface error)

Page 35: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

35

Faults and Failures?

Actual examples

(a) A mathematical function loops for ever from rounding error.

(b) A distributed system hangs because of a concurrency problem.

(c) After a network is hit by lightning, it crashes on restart.

(d) A program dies because the programmer typed: x = 1 instead of x == 1.

(e) The head of an organization is paid $5 a month instead of $10,005 because the maximum salary allowed by the program is $10,000.

(f) An operating system fails because of a page-boundary error in the firmware.

Page 36: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

36

Terminology

Fault avoidance

Build systems with the objective of creating fault-free (bug-free) software

Fault tolerance

Build systems that continue to operate when faults (bugs) occur

Fault detection (testing and validation)

Detect faults (bugs) before the system is put into operation.

Page 37: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

37

Fault Avoidance

Software development process that aims to develop zero-defect software.

• Formal specification• Incremental development with customer input• Constrained programming options• Static verification• Statistical testing

It is always better to prevent defects than to remove them later.

Example: The four color problem.

Page 38: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

38

Defensive Programming

Murphy's Law:

If anything can go wrong, it will.

Defensive Programming:

• Redundant code is incorporated to check system state after modifications.

• Implicit assumptions are tested explicitly.

• Risky programming constructs are avoided.

Page 39: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

39

Defensive Programming: Error Avoidance

Risky programming constructs

• Pointers

• Dynamic memory allocation

• Floating-point numbers

• Parallelism

• Recursion

• Interrupts

All are valuable in certain circumstances, but should be used with discretion

Page 40: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

40

Defensive Programming Examples

• Use boolean variable not integer

• Test i <= n not i = = n

• Assertion checking (e.g., validate parameters)

• Build debugging code into program with a switch to display values at interfaces

• Error checking codes in data (e.g., checksum or hash)

Page 41: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

41

Maintenance

Most production programs are maintained by people other than the programmers who originally wrote them.

(a) What factors make a program easy for somebody else to maintain?

(b) What factors make a program hard for somebody else to maintain?

Page 42: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

42

Fault Tolerance

General Approach:

• Failure detection

• Damage assessment

• Fault recovery

• Fault repair

N-version programming -- Execute independent implementation in parallel, compare results, accept the most probable.

Page 43: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

43

Fault Tolerance

Basic Techniques:

• After error continue with next transaction (e.g., drop packet)

• Timers and timeout in networked systems

• Error correcting codes in data

• Bad block tables on disk drives

• Forward and backward pointers in databases

Report all errors for quality control

Page 44: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

44

Fault Tolerance

Backward Recovery:

• Record system state at specific events (checkpoints). After failure, recreate state at last checkpoint.

• Combine checkpoints with system log that allows transactions from last checkpoint to be repeated automatically.

• Test the restore software!

Page 45: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

45

Software Engineering for Real Time

The special characteristics of real time computing require extra attention to good software engineering principles:

• Requirements analysis and specification

• Development of tools

• Modular design

• Exhaustive testing

Heroic programming will fail!

Page 46: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

46

Software Engineering for Real Time

Testing and debugging need special tools and environments

• Debuggers, etc., can not be used to test real time performance

• Simulation of environment may be needed to test interfaces -- e.g., adjustable clock speed

• General purpose tools may not be available

Page 47: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

47

Some Notable Bugs

• Built-in function in Fortran compiler (e0 = 0)

• Japanese microcode for Honeywell DPS virtual memory

• The microfilm plotter with the missing byte (1:1023)

• The Sun 3 page fault that IBM paid to fix

• Left handed rotation in the graphics package

Good people work around problems.The best people track them down and fix them!

Page 48: 1 Software Engineering II Software Reliability. 2 Dependable and Reliable Systems: The Royal Majesty From the report of the National Transportation Safety

48

End of Lecture 6