reliability tony massihi etan halberg jacob hakak

62
Reliability Tony Massihi Etan Halberg Jacob Hakak

Upload: barbara-strickland

Post on 30-Jan-2016

252 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reliability Tony Massihi Etan Halberg Jacob Hakak

Reliability

Tony Massihi Etan HalbergJacob Hakak

Page 2: Reliability Tony Massihi Etan Halberg Jacob Hakak

Software Engineering

A discipline that focuses on producing software using certain tools and methodologies.

They follow a four step process: Specification: Defining the functions needed. Development: Producing the software. Validation: Testing the software Evolution: Modifying software to meet the

changing needs of the customer.

Page 3: Reliability Tony Massihi Etan Halberg Jacob Hakak

Software Engineering

Most organizations use CASE (Computer-assisted software engineering) tools to support the process of developing and documenting a more detailed design.

Another good approach to software engineering is using object-oriented design.

Page 4: Reliability Tony Massihi Etan Halberg Jacob Hakak

Software Engineering

These standards have led to better software quality over the years, but in order to stay competitive companies must release products quickly.

Many companies feel a tension between meeting tight deadlines and strictly following software engineering methodologies.

Page 5: Reliability Tony Massihi Etan Halberg Jacob Hakak

Software Warranties

Shrinkwrap warranties - Software, such as Microsoft Word, has a limited warranty that says the software will do what the manual says it will do. They provide a 90-day replacement or money-back guarantee.

Warranties for games promise that the original media is free from defects, that you will be able to install it and also act as a 90 day warranty.

Page 6: Reliability Tony Massihi Etan Halberg Jacob Hakak

Problems

Most stores will not fully refund you for unopened items even though the license agreement is inside the box.

Vendors are willing to give you a full refund if software will not install, but will not take liability if your business is harmed because their software crashed at the wrong time.

Page 7: Reliability Tony Massihi Etan Halberg Jacob Hakak

Court Cases

Step-Saver Data Systems v. Wyse Technology & The Software Link

Step-Saver sold timesharing computer systems with Wyse terminals and an OS by The Software Link (TSL). Step-saver purchased and resold 142 copies of the Multilink Advanced OS provided by TSL.

Page 8: Reliability Tony Massihi Etan Halberg Jacob Hakak

Step-Saver v. Wyse & TSL

When Step-Saver called TSL to purchase the OS, the TSL sales rep. said that the OS was compatible with most DOS applications.

The software did not work properly and all three companies together were not able to solve the problems. Therefore, Step-Saver sued Wyse and TSL.

Page 9: Reliability Tony Massihi Etan Halberg Jacob Hakak

Step-Saver v. Wyse & TSL

U.S. Court of Appeals ruled in favor of Step-Saver because the president of Step-Saver never signed a document formalizing the licensing agreement.

The court justified their ruling with the invoice and oral statement constituting a contract.

Page 10: Reliability Tony Massihi Etan Halberg Jacob Hakak

Kantian Analysis

Every software company produces a license agreement to state the terms that the customer agrees to when buying the software. Didn’t matter to TSL if document was signed, just wanted the business so defeats the point of having a licensing agreement.

Page 11: Reliability Tony Massihi Etan Halberg Jacob Hakak

Utilitarian Analysis

Not ethical. Negatives outweigh the positives.

Negatives: TSL sold software with promises that weren’t fulfilled. Step-Saver was sued by 12 of its customers. TSL didn’t care if license agreement was signed before selling many copies.

Positives: Wyse and TSL tried to fix the problems.

Page 12: Reliability Tony Massihi Etan Halberg Jacob Hakak

Social Contract Analysis

Companies have the right to state terms on which the customer must agree to when using software.

If not agreed to then the courts resort to Article 2 of the UCC which made the argument that a contract was formed with the purchase order, invoice and the oral statements from the sales rep.

Page 13: Reliability Tony Massihi Etan Halberg Jacob Hakak

ProCD V. Zeidenberg

ProCD created a computer database containing info from more than 3000 telephone directories. They created an application called SelectPhone where you can search the database for records.

They included a license agreement prohibiting the commercial use of the database and the program, which were displayed every time you run the program.

Page 14: Reliability Tony Massihi Etan Halberg Jacob Hakak

ProCD V. Zeidenberg

Matthew Zeidenberg formed a company called Silken Mountain Web Services and he resold the info in the SelectPhone database.

Zeidenberg argued that the license wasn’t printed on the outside of the box so he shouldn’t be liable. The court ruled in favor of ProCD.

Page 15: Reliability Tony Massihi Etan Halberg Jacob Hakak

Ethical Analysis Kantian: Not ethical. ProCD fulfilled

their duty of informing the customer the terms to which they both must agree to when using product.

Utilitarian: Not ethical. Reproduced someone else’s work.

Social contact: Violated right to intellectual property by stealing their work.

Page 16: Reliability Tony Massihi Etan Halberg Jacob Hakak

Mortenson v. Timberline Mortenson is a national construction

contractor and they purchased copies of a bidding package called Precision Bid Analysis from Timberline.

Mortenson used this to prepare a bid and on the day the bid was due, the software malfunctioned. It printed the message “Abort: Cannot find alternate” 19 times. Mortenson continued to use the software and submitted the bid it produced. Mortenson discovered that its bid was $1.95 million too low.

Page 17: Reliability Tony Massihi Etan Halberg Jacob Hakak

Mortenson v. Timberline

Mortenson is a national construction contractor and Timberline sold bidding package to Mortenson.

It turns out Timberline was aware of the bug since May 1993 and they fixed it and sent newer versions to some of its customers who encountered it, but not to Mortenson.

Page 18: Reliability Tony Massihi Etan Halberg Jacob Hakak

Mortenson v. Timberline

Timberline argued that the license agreement limited the consequential damages that Mortenson can recover from them.

The King County Superior Court ruled in favor of Timberline.

Page 19: Reliability Tony Massihi Etan Halberg Jacob Hakak

UCITA Uniform Computer Information Transaction

Act is a proposed amendment to Article 2 of the UCC, which was proposed after the ruling against The Software Link with the idea that software cannot always be bug free.

Article 2 of UCC (Uniform Commercial Code) governs the sale of products in the U.S.

Page 20: Reliability Tony Massihi Etan Halberg Jacob Hakak

UCITA States

Manufacturers may license software to customers for a period of time.

Manufacturers may prevent the transfer of software from 1 person to another.

Manufacturers may disclaim all liability for defects, must accept “as is”

Page 21: Reliability Tony Massihi Etan Halberg Jacob Hakak

UCITA Continued

Manufacturers may remotely disable licensed software in case of a license dispute.

Manufacturers may collect info about how licensees use their computers.

Applies to software in computers and not embedded systems, such as PDAs, cell phones.

Page 22: Reliability Tony Massihi Etan Halberg Jacob Hakak

Arguments Supporting UCITA

If we want a vital software industry, we need to understand that software is not going to have the same reliability as physical products.

Prevents fraud, so if a customer purchases a license to use the software for a certain period of time, then they can put code that makes it unusable after license has expired.

Page 23: Reliability Tony Massihi Etan Halberg Jacob Hakak

Arguments Supporting UCITA

If the license allows the software to be run on a certain number of computers, then the software can include features to make it impossible to run more machines than specified.

Page 24: Reliability Tony Massihi Etan Halberg Jacob Hakak

Arguments Against UCITA

If you license a piece of software and don’t need it anymore, you can’t give it away legally to someone else.

Allowing companies to sell software “as is” violates the Magnuson-Moss Act which was passed by Congress in 1975 for consumers. It prevented manufacturers from putting unfair warranties on products over $25.

Page 25: Reliability Tony Massihi Etan Halberg Jacob Hakak

Arguments Against UCITA

The Magnuson-Moss Act also made it economically feasible for consumers to bring warranty suits by allowing courts to award attorneys’ fees.

Consumers see the warranty before the software is installed when they click the I accept button. Once the warranty is accepted and the program is run, it cannot be returned, even though one still does not know if the software works properly.

Page 26: Reliability Tony Massihi Etan Halberg Jacob Hakak

Arguments Against UCITA

Their won’t be a uniform law across every state, Maryland and Virginia have passed a different version of the law.

Page 27: Reliability Tony Massihi Etan Halberg Jacob Hakak

Moral Responsibility of Software Manufacturers

Manufacturers rely on consumers to help them identify bugs. They could find these bugs themselves if they hired more testers, but this would result in higher prices and longer development times.

This is a utilitarian way to look at the situation because the positives outweigh the negatives. There will be fewer products with higher prices but the software will be more reliable.

Page 28: Reliability Tony Massihi Etan Halberg Jacob Hakak

Computer Reliability

“The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair.”

-Douglas Adams

Page 29: Reliability Tony Massihi Etan Halberg Jacob Hakak

Forethoughts on ReliabilityAre humans, in general, reliable?

Are computer systems, in general, reliable?

Is the reliability of a computer system a function of the reliability of its maker?

If the maker is flawed, how can his or her creation be flawless?

Page 30: Reliability Tony Massihi Etan Halberg Jacob Hakak

Data-Entry/Data-RetrievalA computer database is a structured

collection of records or data that is stored in a computer system so that a computer program or person using a query language can consult it to answer queries. The records retrieved in answer to queries are information that can be used to make decisions.

Examples of databases and query languages:●Dbase, MySQL, Oracle, PostgreSQL●SQL, CQL, OQL, Datalog

Page 31: Reliability Tony Massihi Etan Halberg Jacob Hakak

How can data cause a system to fail?

Software related● Programming errors● Poor programming practices

Non-software related● Missing, incorrect, inconsistent, or otherwise bad data

Page 32: Reliability Tony Massihi Etan Halberg Jacob Hakak

Data-Entry/Data-RetrievalErrors: Cause and Effect

Mild annoyances● Human error: John Q Smartguy at the bank entered your address wrong. As a result, your credit card bills are sent to the wrong address.

● Computing error: A table column in a database stores your account number. The new software that the ATMs have select the incorrect table column to check. As a result, all ATM cards now do not work, or worse, access incorrect records.

Page 33: Reliability Tony Massihi Etan Halberg Jacob Hakak

Data-Entry/Data-RetrievalErrors: Cause and Effect

Moderate problems● National Crime Information Center (NCIC) and faulty database records● Disqualification due to database records (November 2000 Florida general election; background checks on employees)

● False Arrests due to misinterpretation, incorrectly entered, or otherwise false information (Terry Dean Rogan, Roberto Hernandez)

Page 34: Reliability Tony Massihi Etan Halberg Jacob Hakak

Data-Entry/Data-RetrievalErrors: Cause and Effect

Severe misinterpretation of data● An Iraqi scud missile hit a base in Dhahran and killed 28 US soldiers in Feb 1991. It was recognized by radar but dismissed due to incorrect data.

Page 35: Reliability Tony Massihi Etan Halberg Jacob Hakak

Analysis of NCIC Records

Should the US government take responsibility for the accuracy of the information stored in the NCIC database?

● Privacy Act of 1974● FBI not required to ensure accuracy● Many agencies enter information● Accuracy checks would hinder functionality of database with regard to criminal investigations.

Page 36: Reliability Tony Massihi Etan Halberg Jacob Hakak

The Question of Ethics

Is it ethical for individuals from these agencies, or the agencies themselves, to enter information into a national database without checking whether or not it is accurate and correct?

Page 37: Reliability Tony Massihi Etan Halberg Jacob Hakak

Software and Billing Errors

Even if the data entered into a computer is correct, and the manner in which it is retrieved is correct, there are still errors that occur in the manipulation of that data to consider.

We've already briefly touched on software errors and have seen a short example of a billing issue involving data entry. Let's take a look at how and why not only faulty data but faulty software as well can affect billing and other processes.

Page 38: Reliability Tony Massihi Etan Halberg Jacob Hakak

System Malfunctions

Qwest billing software malfunction● $57,346 phone bill

USDA beef prices● $15 - $20million loss for beef producers

US Postal Service● 50,000 pieces of mail returned to sender

The car with a mind of its own● BMW on-board computer crash

Page 39: Reliability Tony Massihi Etan Halberg Jacob Hakak

System Failures

LA County/USC Medical Center● Backlogging of new lab computer system

Air Traffic Control System – Japan● 4 hour system down; delays/cancellations

Chicago/London Trade/Exchange● Hour long trade suspension, multiple times

Comair (sub. Delta)● Crew assignment system failure

Page 40: Reliability Tony Massihi Etan Halberg Jacob Hakak

Postal Service Article:http://query.nytimes.com/gst/fullpage.html?res=9805EFDE133EF93AA3575BC0A960958260&n=Top/News/Business/Small%20Business/Innovation

Beef Price Article:http://www.beefusa.org/NEWSUSDAReportingErrorResultsin$42-54MillionLosstoCattleIndustry4134.aspx

Car With a Mind of Its Own Article and followup:http://aardvark.co.nz/daily/2003/n051301.shtmlhttp://www.microsoft.com/presspass/press/2002/mar02/03-04bmwpr.mspx

Comair Cancellations Article:http://www.usatoday.com/travel/flights/delays/2004-12-25-comair-cancels-flights_x.htm

Page 41: Reliability Tony Massihi Etan Halberg Jacob Hakak

The Question of Ethics, revisited

A specific example: Amazon.com, UK● iPaq handheld computers listed at £7● Actual price, £275

Amazon refused delivery unless buyers paid the difference, citing their Pricing and Availability Policy.

Focus● Amazon's refusal to fill orders● Customers' bids

Page 42: Reliability Tony Massihi Etan Halberg Jacob Hakak

Kantianism v. Utilitarianism

Kantianism● In the end it would result in higher prices and tend away from the greater good.

● Unethical for consumers to assume it was just a 'really good sale' therefore they were not acting in good faith.

Utilitarianism● Unethical because if this behavior was acceptable prices would increase; costs outweigh the benefits.

Page 43: Reliability Tony Massihi Etan Halberg Jacob Hakak

Increasingly Complex Systems

● fully or partially controlled by computers ● embedded systems – a computer used as a component of a larger system ● real-time systems – computers that process data from sensors as events occur

Page 44: Reliability Tony Massihi Etan Halberg Jacob Hakak

Notable System Failures

Patriot Missile● floating point variable stored values with insufficient precision

Ariane 5● satellite launch vehicle, 64-bit floating point value converted to 16-bit signed int

Mars Orbiter and Polar Lander● Orbiter: english vs metric units● Lander: landing gear sensor passed incorrect signal value

Denver International Airport● software project nightmare

Page 45: Reliability Tony Massihi Etan Halberg Jacob Hakak

What can be done?

Unfortunately, most of these problems must be solved on a case by case basis. There is no real tried, tested, and true method for ensuring the reliability of all the software and hardware that a system is composed of.

Good programming practices and well-educated users is the way to go.

Page 46: Reliability Tony Massihi Etan Halberg Jacob Hakak

Computer Simulations

Uses of simulation Validating simulations

Page 47: Reliability Tony Massihi Etan Halberg Jacob Hakak

Uses of Simulations Simulations can never completely replace physical

experiments. Practical use of simulations:

To lower monetary or time cost of laboratory experiments Pharmaceutical Design Car Crashes

Ethics of a non-simulated experiment are in question Medical Devices Crashing cars with real people

Often experiments are impractical How long will it take before the world runs out of oil?

Simulations can be used to model past events Understand world around us Predict the future

Page 48: Reliability Tony Massihi Etan Halberg Jacob Hakak

Crash Test Simulation

3 different water molecule simulations, progression of technology

Page 49: Reliability Tony Massihi Etan Halberg Jacob Hakak

Safety Simulation

Crash Recreation Simulation from YouTube

Space Shuttle Landing

Page 50: Reliability Tony Massihi Etan Halberg Jacob Hakak

Water Molecule Simulations

Models before computers

Simple Computer Models – water molecules in motion

Complex Computer Model – water movement through permiable membrane

Page 51: Reliability Tony Massihi Etan Halberg Jacob Hakak

Validating Simulations

Validation: Does the model accurately represent the real system?

Verification: Does software correctly implement model?

Validation methods Make prediction, wait to see if it comes true Predict the present from old data Test credibility with experts and decision

makers

Page 52: Reliability Tony Massihi Etan Halberg Jacob Hakak

Therac-25

Genesis of the Therac-25 Chronology of Accidents

and AECL Responses Software Errors Post Mortem Moral Responsibility of the

Therac-25 Team “The Rack” – Torture device used in the Middle Ages, Tower of London 1

Page 53: Reliability Tony Massihi Etan Halberg Jacob Hakak

Genesis of the Therac-25 Predecessors to the Therac-25, Therac-6 and Therac-

20 were built by AECL and CGR Therac-6 and Therac-20 incorporated some software Previous designs by CGR incorporated no software 1

Therac-25 built by AECL Minicomputer used as integral part of system Processes controlled with assembly software Hardware safety features replaced with software Based on designs of the Therac-6 Also reused code from Therac-20

First Models - 40 errors per day was not unusual

Page 54: Reliability Tony Massihi Etan Halberg Jacob Hakak

AECL and the FDA Atomic Energy of Canada, Limited

“Crown Coorporation” State controlled Similar to government agency

Therac-25 was FDA approved 3

Opportunity for approximately 6 years of testing Only 2700 hours testing integrated system Limited software documentation Minimal software testing Single Programmer did all source code for Therac-25

Left AECL in 1986, after some accidents had occurred Limited information about his background

Page 55: Reliability Tony Massihi Etan Halberg Jacob Hakak

Citation 3

Page 56: Reliability Tony Massihi Etan Halberg Jacob Hakak

Chronology of Accidents and AECL Responses

Marietta, Georgia (June 1985) Crippling injuries from radiation overdose

Hamilton, Ontario (July 1985) Patient died 2 months after radiation overdose

First AECL investigation (July-Sept. 1985) Could not reproduce overdoes

Yakima, Washington (December 1985) Permanent scars and disabilities

Tyler, Texas (March 1986) Died from complications after five months Real-time video and audio monitor was not functioning

Second AECL investigation (March 1986) Tyler, Texas (April 1986)

Patient died after 3 weeks due to massive overdose to brain Yakima, Washington (January 1987)

Patient died 3 months later FDA declares Therac-25 defective (February 1987)

Solutions relied on additional hardware locks

Two initial investigations failed to detect any problems because of poor design and lack of documentation

Page 57: Reliability Tony Massihi Etan Halberg Jacob Hakak

Software Errors

RACE CONDITION:

A race condition or race hazard is a flaw in a system or process whereby the output of the process is unexpectedly and critically dependent on the sequence or timing of other events. The term originates with the idea of two signals racing each other to influence the output first. 4

Page 58: Reliability Tony Massihi Etan Halberg Jacob Hakak

Race Conditions Two race conditions in Therac-25 software

Command screen editing Movement of electron beam gun

These two conditions were “racing” with each other. If the Command screen editing occurred while the electron beam gun was moving, the software would not recognize the changes in input.

If this variable was changed from an “E” to an “X” during the 8 second movement, the change would not take effect.

Page 59: Reliability Tony Massihi Etan Halberg Jacob Hakak

Race Conditions and Parallel Programming

“…There is no efficient algorithm that can help detect race conditions in a program. As such, there are no easy-to-use pedagogical tools.” 5

“The most often encountered race conditions are data race conditions. A data race condition is caused by unordered concurrent accesses of the same memory location from multiple threads. Less frequent but harder to find are general race conditions. A subtle general race condition often occurs at a transitory state due to the undetermined program execution order of multiple concurrent events that have data conflicts.” 6

Page 60: Reliability Tony Massihi Etan Halberg Jacob Hakak

Post Mortem AECL focused on fixing individual bugs “Most accidents are system accidents; that is, they stem

from complex interactions between various components and activities.” 7

Therac-25 was not fail-safe Fundamental design flaws No devices to report overdoses Operator could not directly monitor or observe patient

Similar to Milgram Experiments at Yale in the 1960’s Software lessons

Race Condition Debugging Simplicity in Design Documentation is crucial Reusing code can lead to potential errors

AECL did not communicate fully with customers

Page 61: Reliability Tony Massihi Etan Halberg Jacob Hakak

Moral Responsibility of theTherac-25 Team

Conditions for moral responsibility Causal condition: The actions (or inactions) of

the agent must have caused the harm Mental condition: The actions (or inactions)

must have been intended or willed by the agent The designers of the Therac-25 did not intend to

cause lethal overdoses The moral agent is also responsible for

carelessness, recklessness, or negligence

Page 62: Reliability Tony Massihi Etan Halberg Jacob Hakak

Citations 1. http://content.answers.com/main/content/wp/en-commons/thumb/5/54/200px-A_Torture_Rack.jpg 2. http://computingcases.org/case_materials/therac/supporting_docs/therac_case_narr/cmc.html 3. http://www.cs.jhu.edu/~cis/cista/445/Lectures/Therac.pdf 4. http://en.wikipedia.org/wiki/Race_condition 5. Carr, Steven; Mayo, Jean; Shene, Ching-Kuang. “Race Conditions: A Case Study”. Journal of

Computing Sciences in Colleges, 2001. Volume 17, Issue 1. pp 90-105. 6. Chen, Liang T. “The Challenges of Race Conditions in Parallel Programming”. July 21, 2006. Sun

Developer Network. 7. Nancy Leveson and Clark Turner. “An Investigation of the Therac-25 Accidents.” Computer, 26(7):

18-41, 1993.