a use case model for ras (reliability, availability, and ... · a use case model for ras...

21
A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors) Environment May 18, 2004 Sue Kelly Sandia National Laboratories [email protected], 505-845-9770 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

Upload: others

Post on 10-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

A Use Case Model for RAS(Reliability, Availability, and Serviceability)

in an MPP(Massively Parallel Processors)

Environment

May 18, 2004

Sue KellySandia National Laboratories

[email protected], 505-845-9770

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Companyfor the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

Page 2: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

• A brief tutorial on Use Cases• RAS Features for MPPs Use Case Model

Outline of Talk

Page 3: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

References

Applying Use Cases by Geri Schneider and JasonP. Winters, Addison-Wesley, 1998.

Object-Oriented Software Engineering: A UseCase Driven Approach by Ivar Jacobson, et. al.,Addison-Wesley, 1992.

UML Distilled by Martin Fowler with Kendall Scott,Addison-Wesley, 1997.

An investigation into RAS Features for MassivelyParallel Processor Systems by Suzanne M. Kellyand Jeffry B. Ogden, SAND2002-3164, 2002.

Page 4: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

• A Standard* object modeling language• Unifies the models of Booch, Rumbaugh (OMT) and

Jacobson• Not a method; no notion of process• Can incorporate some or all of the UML notations and

diagrams (e.g. use cases) into your softwaredevelopment process of choice.

The Unified Modeling Language

Andrew S. Tanenbaum

Page 5: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

Use Case Concepts

• Use Case – A specific way of using the system byperforming some part of the functionality.

• Actor – A representation of what interacts with the system.May be a person, another system, or something else (e.g.cron).

• Use cases are represented by ovals. I use a namingconvention of verb followed by object. Subject is impliedby the initiating actor.

• An actor is represented by a stick figure.• An arrow indicates the direction of initiation (not

necessarily data flow).

Request CashWithdrawal

ATM Customer

Page 6: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

Use Case Concepts (cont.)

• Each use case constitutesa complete course ofevents initiated by an actorand specifies theinteractive between theactor and the system

• Use Case Diagram – agraphical representation ofthe entire set of actors anduse cases.

• Use Case Model – the usecase diagram plus thedescriptive text for eachuse case.

Request CashWithdrawal

ATM Customer Make Deposit

Change PIN

Service Provider

Replenish Supplies

Timer

Download Status

Log Transaction«uses»

«uses»

Page 7: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

Use Case Documentation

• My preferred template foreach use case:– Description - one or two

lines– Actors - list– Pre & Post conditions– Detailed Flow of Events– Alternate Flows– User Interface– Data Requirements

Page 8: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

The Value of Use Cases

• A customer-friendly way of describing functionaland performance requirements

• A good basis for developing test cases• An excellent basis for developing the user guide• Can be applied even if not using object-oriented

development (OOAD)• A great place to rough-out the GUI• A great place to start finding your data

requirements

Page 9: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

What Use Cases Do Not DO

• They only define the customer visible portion ofthe system.

• They provide minimal information for systemarchitecture design.

Page 10: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

Use Case Model of aRAS system for MPPs

Page 11: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

Definition of RAS

• Reliability - fault avoidance– the likelihood a system or component will sustain

full functional operation over its lifetime.– Measured in MTBF (mean time between failures).

• Availability - fault tolerance– the likelihood a system is operational at any given

time.– Measured in up time percentage.

• Serviceability - fault identification and repair– measure of a system’s ability to sustain repairs to

faulty components.– Measured in MTTR (mean time to repair) and $$$s.

Page 12: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

Features of the Model

• Integrates hardware and software RAS• Comprehensive model - I.e. includes RAS

features found on the most humble PC all the wayto unique MPP-unique RAS features

• Generally applicable to clusters andembarrassingly parallel systems

Page 13: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

The Actors

• Asynchronous Event• Manager• Operator• Synchronous Event• System Hardware

Administrator• System Software

Administrator• System Software

Programmer• User

User

System Software Administrator

System Hardware Administrator

Manager

Operator

System Software Programmer

Asynchronous Event Synchronous Event

Page 14: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

Use Case Diagram for User

User

Determine statusof system resources

Determine statusof job(s) that

were or are running

Review the logsof job(s) that

were run

Utililize applicationcheckpoint/restart

capability

Utilize applicationmonitoring capabilit

y

Page 15: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

Use Case Diagram forSystem Software Administrator

SSA

Determine thestatus of jobs

Manage user jobs

Determine the statusof system software

components

Determine the statusof system hardware

components

Restart failedhardware/software

components

Startup/shutdown/reboot systemcomponents Run tests/diagnost

ics

Data mine currentand historicalinformation

Review logs

Manage disk space

Page 16: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

Use Case Diagram forSystem Software Programmer

System Software Programmer

Analyze post-mortema system software

failure

Obtain verbosedebugging informati

on

Upgrade systemsoftware

Page 17: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

Use Case Diagrams forSystem Hardware Administrator and Manager

System Hardware Administrator

Diagnose questionable hardware

Add/remove/replacehardware components

Test hardwarecomponent(s)

Manager

Retrieve performance statistics

Page 18: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

Use Case Diagrams forOperator and Synchronous Event

Operator

Follow notification procedure

Check if systemis operational

Receive audible/visible notification

of problems

Synchronous Event

Backup selectedfiles

Perform proactivesystem diagnostics

Page 19: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

Use Case Diagram forAsynchronous Event

System Asynchronous Event

Causes failureof system software

service

Hangs/panic operating system

Faults hardwarewith hot spare

Faults hardwarethat is a singlepoint of failure

Faults hardwarethat can be isolate

d

Causes environmental failure

Causes recoverable error

Results in unknown event

Notify SSA ofproblems

Page 20: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

Example Use Case Description

Page 21: A Use Case Model for RAS (Reliability, Availability, and ... · A Use Case Model for RAS (Reliability, Availability, and Serviceability) in an MPP (Massively Parallel Processors)

Conclusions

• Use cases are an effective communication tool.• This model is the basis for the Red Storm system.