operational reliability1

7/30/2019 Operational Reliability1

1/21

RELIABILITYReliability Engg. Is the technology concerned with predictions, controls, continuous improvements in material &

technology & thus continuous reduction of equipment failure rates. Reliability is different from quality as reliability

places more emphasis on the activities of design, manufacturing & operation in the field. Reliability is generally, in

industries, reliability does not necessarily mean failure free operations. Of course, failure free operation is important for

one shot devices (missies, unmanned space-craft) and non-reliable systems like aircraft, high hazards equipments or lifesaving components etc.

The concern about reliability can be felt from the comments of an astronaut

The most nerve-wrecking part of any space flight is the fact that your life depends upon thousands of critical parts each

produced probably by the lowest bidder.

In statics or factor analysis the term reliability means:

The amount of credence placed in a result.

The precision of a measurement as measured by the variance of repeated measurements of the same objects.

Engineering reliability is the probability that a product, device or equipment will give failure free performance of its

intended functions for the required duration of time.

DESIGN ASPECTS FOR RELIABILITY IMPROVEMENTS FOR INDUSTRIAL EQUIPMENTS ARE:

1} Massive Over-Design when Weight:

Space and Cost Limits Permit :

Using 500tones structural or castings in place of 100tones and so as.

2} Simplicity and Standardization:

Less no. of parts increase reliability of equipment or system; using proven, standard components of a supplier instead of

asking them for special orTailor-made components. This may call for some amount of over design or redesign but may

prove to be over all cheaper.

3} Derating of Equipments:

A 50ton Electric Arc Furnace in Alloy Steel Plant, Durgapur was derated and supplied as 40 ton Furnace; for reliability

electric motors are often derated.

4} Human Engineering and Maintability Considerations:

Making the design in such a way that using incorrectly or fitting incorrectly parts are very difficult.

Identifying critical components/parts having less reliability and taking necessary actions is also one of the main tasks of

reliability improvement. 80-20 concept can be applied here also i.e.. 20%of parts amount for 80% of failures/problems

1


2/21

"Operational Reliability basically not an initiative - but its just a better way of running the business. It changes the way of

the workforce, thinks, and acts and provides the reliability tools to help them."

TECHNIQUE FOR IMPROVEMENT OPERATION RELIABILTY:

Reliability improvement is a continuous engg. Process. It involves enormous amount of data collection (from operating

equipments and service equipments etc.) . As the failures are of theree tyopes, early faikure, chance failure and wear-out

failures, their analysis is done in right perspective.

The following processe are essential in reliability study programme:

(i) The reliability programme starts in nthe conceptual phase of the product or equipement and continousthrought the design, development, production, testing, field evaluation and service stages etc.

(ii) Adequate management and orgasnisational support should be there. Involvement of all department units

that affect reliability, is essential.

(iii) Proper failure reporting system fron all concerned agencies has to built up. Necessary signal measuring

devices should be installed and their feedback maonitored

(iv) Proper action plans, specifying responsibilities, procedures, schedules and budgets (if necessary) to be

issued and follow up.

(v) The execution of programme is both technical abd report deviations for taking corrective actions.

Reliability is a probability that a product, device or equipment will give failure free performance of its intended functions

for the required duration of time.

Reliability improvement is a continuous engineering process. It involves enormous amount of data collection and their

analysis especially with respect to failure modes and stresses etc.s the failure of three types early failure, chance failureand wear out failures, their analysis is done in right perspective.

IMPROVEMENT OF COMPONENT:

We can use superior components and parts with low failure rates. However we

would immediately release that components of high reliability will require more time and money for development. They

may also be larger in size and weight. Generally objective is not merely to produce a system with highest reliability, but to

evolve a system which reflects an optimum total cost. The major items contributing to total cost are research and

development production spares and maintenance. Similarly the production facilitates sufficiently sophisticated to enable

manufacture of precision components with the result that production cost also would increase with requirement of greater

reliability on other hand the cost of maintenance and spares would reduce with an increase in reliability factor. The

objective in the majority of design will be to attain this optimum cost. However the reliability will assume greater

significance when the goal is not so much the cost but rather the requirement a set mission or for the unit or equipment.

Reliability Improvement through Redundancy:

In a system where there are many subsystems and elements, reliability of each element has to

be improved to near 100% to achieve good system reliability. It has already been mentioned that in a system 400

2


3/21

elements, each of 98% reliability, the system reliability will come only about 2%. But if the reliability of individual

elements/components or subassemblies can not be improved further, we can duplicate do triplicate those components to

improve the system reliability.

Let us take a simple case of one pump unit, one valve unit and one cylinder in hydraulic system and assume the

probability of success of each as 70%, 90% and 80% respectively. In a non-redundant system the reliability of the system

Ps (system) can be shown as:

Now if we duplicate the pump unit i.e... Add on more pump unit in parallel along with original pump unit the system

failure on account of the pump unit will occur only when both pump fail. Again assuming the reliability of both pumps Ps

(p) as 70% i.e.. Probability of the failure of each pump Pf(p) as 30% the system can be shown and system reliability Ps

(system) can be calculated as given below:

Therefore Ps (at least one pump) =100%-pf (p1)* pf (p2)

=100 %-( 30%*30%) =91%

Therefore (system)=Ps(at least one pump )*Ps(v)*Ps(c)

=91%*90%*80%

Pump

Ps (P) = 70%

Valve

Ps (V)90%

Cylinder Ps (C)

=80%

Pump- 1

Ps (P1) = 70%

Pf (P1) = 30%

Cylinder

Ps (C) = 80%Pump-2

Ps (P2) = 70%

Pf (P2) = 30%

Valve

Ps (V) = 90%

3


4/21

=66%, Thus by redundancy, the system reliability can be improved.

In addition to cost and space limitation there are some additional constraints in reliability through redundancy such as :

Parallel equipment are some times, connected with charge over switch(for automatic charge over) which may not be fail-

proof and may introduce another reliability factor.

With duplication or triplication of components not working failed components may cause adverse effect on working

components(eg. Possible internal leakage through failed or non working hydraulic valves or pumps which may cause mal-function).

If the state of the art is such that either it is not possible to produce highly reliable components or the cost of producing

such components is very high, we can improve the system reliability by the technique of introducing redundancies this

involve the deliberate creation of the new parallel path in a system.

There are many methods of introducing redundancies in a system. A few of these will be consider below.

Stand by redundancy:

Another type of redundancies that can be introduced in a system is standing by redundancy. A twoelement parallel system used for comparison all the channels or paths are active from the beginning of the operation of the

system till it failure. In a stand by system all the paths are not active at the same time.

OPTIMIZATION:

The reliability of a system can be improved considerably by introducing redundancy either in the sub

system or in the element. It was also shows that the element or component redundancy is superior to sub system or unit

redundancy.

Maintaibility Criteria:

Executive Summary:

Faced with shrinking maintenance budgets and increasingly competitive markets, maintainability is an issue of growing

importance for many companies. Although, maintainability is not a new concept, many companies struggle with

consistent, standardized maintenance input during the project delivery process. An important characteristic of any design,maintainability pertains to the ease, accuracy, safety, and economy in the performance of maintenance actions. This

research examines the opportunities available through the effective inclusion of maintainability concepts during the

project delivery process.

The Construction Industry Institute (CII) defines maintainability as the optimum use of facility maintenance knowledge

and experience in the design/engineering of a facility that meets project objectives (Constructability Implementation

Guide 1993). In this context, maintainability refers to a formal process to include relevant maintenance input during all

phases of the facility delivery process. The Maintainability Research Team adopted a format similar to constructability forits research methodology and developed model process.

Research Purpose and Objectives

The primary purpose of this research is to develop a model process for incorporating maintenanceknowledge and experience into the planning, design, procurement, construction, and start-up of facilities. Specificresearch objectives include: (1) define existing levels of maintainability implementation; (2) identify best practices that

improve maintainability of capital projects; (3) compile a model process for implementing maintainability; and (4)

conduct case studies to illustrate best practices and the model process for maintainability implementation.

Research Scope : Givencomplexity and variations of maintainability, this research focused on general practices to aid in formalization of

maintainability efforts during the project delivery process. Formal implementation of maintainability is not sufficiently

mature to obtain quantitative data, and it would be difficult to develop a basis for evaluation. The scope of this

4


5/21

investigation is limited to maintainability activities during six phases of the project delivery process: (1) planning; (2)

design; (3) procurement; (4) construction; (5) start-up; and (6) operations and maintenance. This research surveyed a

broad cross-section of companies engaged in many different types of construction, ranging from general building to

petrochemical. Capital and retrofit projects for equipment, systems, and facilities were included in this research. As

maintainability most directly impacts the owner of constructed projects, this research focused on owner organizations.

Research Methodology : The

research methodology included: (1) literature review; (2) a questionnaire survey; (3) 35 personal and telephone

interviews; and (4) seven in-depth case studies with industry representatives.

Levels of Maintainability Implementation:

The research data revealed attributes that were subjectively organized into five levels of

maintainability implementation: (1) design/engineering experience; (2) effective organizational standards; (3) developing

maintainability process; (4) formal maintainability process; and (5) comprehensive maintainability program. Each level

expands and refines the attributes of the preceding level, increasing the opportunity for maintainability improvement on

capital projects.

Model Process for Maintainability Implementation :

Best practices observed during the research data collection were organized into a model process

for maintainability implementation. The model process was developed to provide guidance in the planning, development,

and implementation of maintainability at both the corporate and project levels. Providing an overview of the

maintainability program, the model process has six milestones: (1) commit to implementing maintainability; (2) establish

maintainability program; (3) obtain maintainability capabilities; (4) plan maintainability implementation; (5) implement

maintainability; and (6) update maintainability program. Each milestone contains several steps and activities that further

describe the details of implementation.

Practical Applications: Project-

specific factors affecting the need for maintainability efforts are grouped into two categories: owner related issues and

project attributes. The owner related issues are: (1) owner type; (2) past maintenance experience; (3) maintenance

strategy; and (4) projected cost of maintenance. Project attributes include: (1) construction type; (2) criticality; (3)

complexity; (4) projected life of facility; and (5) location. Five factors that affect how a formal maintainability process

will be implemented are: (1) new versus retrofit; (2) project size; (3) project delivery system; (4) maintenance

organization; and (5) related industry practices.

Conclusions: Implementation

of a formal maintainability process involves a fundamental shift in the role of maintenance, from a necessary evil to a value adding activity, in the project delivery process. Maintenance helps achieve and sustain optimum reliability and

performance for all projects. Formal maintainability programs provide benefits to both owner and contractor

organizations. Owners benefit from improved control over maintenance costs and improved facility availability. Designers

and constructors can increase client satisfaction and use success with a maintainability process as a value-adding service

for owner clients.

Recommendations: Each

company must assess the need for maintainability on future projects and then determine the appropriate level of

maintainability efforts. Development of the formal process should reflect the organizational need, with the purpose ofensuring maintainability objectives are met. A maintainability process has the potential for greatest (and most cost

effective) impact if it can be integrated with existing company work processes and related improvement initiatives, such

as Total Quality Management, etc.

Need for Future Research:

Future research needs to be conducted in measuring and quantifying costs/benefits of maintainability in order to

demonstrate the financial aspects of maintainability. Similarly, the need exists to measure and document performance of

the maintainability process implementation.

5


6/21

Maintainability Program Plan

Overview

The primary purpose of the 'Maintainability Program Plan' is to improve operational readiness, reduce maintenance

manpower needs, reduce system life cycle cost and provide data essential for management.

The objective shall be to ensure attainment of the maintainability requirements of the acquisition.

The maintainability aspect during the systems development is extremely important and it is vital that supplier are aware oftheir responsibilities in this respect as the results can have serious affects for the user. The Maintainability requirements

must be expressed as definitively as possible. The requirements shall apply to planned maintenance in the support

environment and shall be expressed in quantitative terms:

time (e.g., turn around time, time to repair, time between maintenance actions);

rate (e.g., maintenance hours per operating hours, frequency of preventative maintenance);

Complexity (e.g., number of people and skill levels, variety of support equipment).

The expectation of carrying out repairs in the field by substitution of components (e.g., the replacement of a faulty card or

module in an electronic item) shall be defined.

Ishikawa diagram

Definition:

A graphic tool used to explore and display opinion about sources of variation in a process. (Also called a Cause-and-

Effect or Fishbone Diagram.)

Purpose:

To arrive at a few key sources that contributes most significantly to the problem being examined. These sources are then

targeted for improvement. The diagram also illustrates the relationships among the wide variety of possible contributors

to the effect.

The figure below shows a simple Ishikawa diagram. Note that this tool is referred to by several different names: Ishikawa

diagram, Cause-and-Effect diagram, Fishbone diagram, and Root Cause Analysis. The first name is after the inventor of

the tool, Kaoru Ishikawa (1969) who first used the technique in the 1960s.

6


7/21

The basic concept in the Cause-and-Effect diagram is that the name of a basic problem of interest is entered at the right of

the diagram at the end of the main "bone". The main possible causes of the problem (the effect) are drawn as bones of

of the main backbone. The "Four-M" categories are typically used as a starting point: "Materials", "Machines"

"Manpower", and "Methods". Different names can be chosen to suit the problem at hand, or these general categories can

be revised. The key is to have three to six main categories that encompass all possible influences. Brainstorming istypically done to add possible causes to the main "bones" and more specific causes to the "bones" on the main "bones".This subdivision into ever increasing specificity continues as long as the problem areas can be further subdivided. The

practical maximum depth of this tree is usually about four or five levels. When the fishbone is complete, one has a rather

complete picture of all the possibilities about what could be the root cause for the designated problem.

The Cause-and-Effect diagram can be used by individuals or teams; probably most effectively by a group. A typical

utilization is the drawing of a diagram on a blackboard by a team leader who first presents the main problem and asks for

assistance from the group to determine the main causes which are subsequently drawn on the board as the main bones ofthe diagram. The team assists by making suggestions and, eventually, the entire cause and effect diagram is filled out.

Once the entire fishbone is complete, team discussion takes place to decide what the most likely root causes of the

problem are. These causes are circled to indicate items that should be acted upon, and the use of the tool is complete.

The Ishikawa diagram, like most quality tools, is a visualization and knowledge organization tool. Simply collecting the

ideas of a group in a systematic way facilitates the understanding and ultimate diagnosis of the problem. Several computer

tools have been created for assisting in creating Ishikawa diagrams. A tool created by the Japanese Union of Scientists and

7


8/21

Engineers (JUSE) provides a rather rigid tool with a limited number of bones. Other similar tools can be created using

various commercial tools.

Only one tool has been created that adds computer analysis to the fishbone. Bourne et al. (1991) reported using Dempster-

Shafer theory (Shafer and Logan, 1987) to systematically organize the beliefs about the various causes that contribute to

the main problem. Based on the idea that the main problem has a total belief of one, each remaining bone has a belief

assigned to it based on several factors; these include the history of problems of a given bone, events and their causal

relationship to the bone, and the belief of the user of the tool about the likelihood that any particular bone is the cause of

the problem.

How to Construct:

1. Place the main problem under investigation in a box on the right.

2. Have the team generate and clarify all the potential sources of variation.

3. Use an affinity diagram to sort the process variables into naturally related groups. The labels of these groups are the

names for the major bones on the Ishikawa diagram.

4. Place the process variables on the appropriate bones of the Ishikawa diagram.

5. Combine each bone in turn, insuring that the process variables are specific, measurable, and controllable. If they are

not, branch or "explode" the process variables until the ends of the branches are specific, measurable, and controllable.

Tip:

Take care to identify causes rather than symptoms

Post diagrams to stimulate thinking and get input from other staffSelf-adhesive notes can be used to construct Ishikawa diagrams. Sources of variation can be rearranged to reflect

appropriate categories with minimal rework

Insure that the ideas placed on the Ishikawa diagram are process variables, not special caused, other problems

tampering, etc

Review the quick fixes and rephrase them, if possible, so that they are process variables.

8


9/21

References:

Cause & Effect Diagram:

The cause & effect diagram is the brainchild of Kaoru Ishikawa, who pioneered quality management processes in theKawasaki shipyards, and in the process became one of the founding fathers of modern management. The cause and effec

diagram is used to explore all the potential or real causes (or inputs) that result in a single effect (or output). Causes are

arranged according to their level of importance or detail, resulting in a depiction of relationships and hierarchy of events.

This can help you search for root causes, identify areas where there may be problems, and compare the relativeimportance of different causes.

Causes in a cause & effect diagram are frequently arranged into four major categories. While these categories can be

anything, you will often see:

Manpower, methods, materials, and machinery (recommended for manufacturing)

Equipment, policies, procedures, and people (recommended for administration and service).

These guidelines can be helpful but should not be used if they limit the diagram or are inappropriate. The categories you

use should suit your needs. At Sky Mark, we often create the branches of the cause and effect tree from the titles of the

affinity sets in a preceding affinity diagram.

The C&E diagram is also known as the fishbone diagram because it was drawn to resemble the skeleton of a fish, with the

main causal categories drawn as "bones" attached to the spine of the fish, as shown below.

Cause & effect diagrams can also be drawn as tree diagrams, resembling a tree turned on its side. From a single outcome

or trunk, branches extend that represent major categories of inputs or causes that create that single outcome. These large

branches then lead to smaller and smaller branches of causes all the way down to twigs at the ends. The tree structure has

an advantage over the fishbone-style diagram. As a fishbone diagram becomes more and more complex, it becomes

difficult to find and compare items that are the same distance from the effect because they are dispersed over the diagram.

With the tree structure, all items on the same causal level are aligned vertically.

9


10/21

To successfully build a cause and effect diagram:

1. Be sure everyone agrees on the effect or problem statement before beginning.2. Be succinct.

3. For each node, think what could be its causes. Add them to the tree.

4. Pursue each line of causality back to its root cause.

5. Consider grafting relatively empty branches onto others.

6. Consider splitting up overcrowded branches.

7. Consider which root causes are most likely to merit further investigation.

Other uses for the Cause and Effect tool include the organization diagramming, parts hierarchies, project planning, tree

diagrams, and the 5 Why's.

Pareto Chart or Juran Diagram

A quality tool, also called a Juran diagram, that is based the Pareto Principle, which uses attribute or discrete datawith the

data arranged in descending order, and with the most occurrences shown first. May use a cumulative line to mark

percentages for each group or bar, which distinguishes the Pareto Principal or the 80/20 rules that states 20 percent of

items will cause 80 percent of the problems.

Principle: the 80/20 Rule

In the very early 1900s, an Italian economist by the name of Vilfredo Pareto created a mathematical formula

describing the unequal distribution of wealth he observed and measured in his country: Pareto observed that roughlytwenty percent of the people controlled or owned eighty percent of the wealth. In the late 1940s, Dr. Joseph M. Juran, a

Quality Management pioneer, attributed the 80/20 Rule to Pareto, calling it Pareto's Principle. While some may claim tha

Jurans broad attribution of this scientific observation to Pareto is inaccurate, Pareto's Principle or Pareto's Law as it is

sometimes called, can be a very effective business tool one that can help us manage more effectively.

The example below is from the Dale H. Besterfield, Ph.D. book, Quality Control Sixth Edition, that includes a CD of

Excel macros

10
http://www.sixsigmaspc.com/dictionary/Pareto-principal.htmlhttp://www.sixsigmaspc.com/dictionary/Pareto-principal.htmlhttp://www.sixsigmaspc.com/dictionary/discreteattributedata.htmlhttp://www.sixsigmaspc.com/dictionary/discreteattributedata.htmlhttp://www.amazon.com/dp/0130256684?tag=zerorejectsforwi&creative=373489&camp=211189&link_code=as3&creativeASIN=0130256684http://www.amazon.com/dp/0130256684?tag=zerorejectsforwi&creative=373489&camp=211189&link_code=as3&creativeASIN=0130256684http://www.sixsigmaspc.com/dictionary/discreteattributedata.htmlhttp://www.amazon.com/dp/0130256684?tag=zerorejectsforwi&creative=373489&camp=211189&link_code=as3&creativeASIN=0130256684http://www.sixsigmaspc.com/dictionary/Pareto-principal.html


11/21

Paint Nonconformities

Number Category Freq. Percent Cumulative %

2 Lt. Spray 582 30.9 30.9

7 Runs 434 23.1 54.0

3 Drips 227 12.1 66.1

1 Blister 212 11.3 77.4

5 Splatter 141 7.5 84.8

6 Bad Paint 126 6.7 91.5

4 Overspray 109 5.8 97.3

8 Other 50 2.7 100.0

there are a few of issues with this debate that are not considered in the article

1) The ratio of "casual players" to "power gamer" and the resulting revenue stream they represent

2) The cost of power gamers on resources (ie bandwidth) vs casual gamers

3) The average length of subscription between the two demographics

4) The effects of poor code and game design on the power curve over time. Or in other words it pays to exploit early and

exploit often. The folks that can get to the broken content first (ie power gamers) get the easy path to victory that is then

nerfed (typically over-nerfed) so that the casual players need to slog through mind-numbing "content" to achieve the same

11


12/21

ends.

Items 1-3 should be a factor in determining which demographic to cater toward...even if the only game that seems to cater

to the casual player has been...ah hem..."less than fully successful" (Horizons)

Item 4 makes a mockery (or perhaps strawman) of the "life isn't fair" concept as MMORPG game designers have, as a

whole, stacked the deck against the casual player. This isn't about "skill" but ability to take advantage of broken game

mechanics while they exist to get ahead of the power curve and stay there whether it is items in EQ or realm points in

DAoC or

Having played EQ with FoH members and then Test Server players and finally a live guild with far above averagerepresentation of Best of the Best winners I can agree that some players are simply better than others. On the other hand,

they are better because they understand the underlying game mechanics better than the average player and uses (or

exploits) them to their maximum benefit. Tactics and strategies that once known to the general populace (and thereby the

devs) are typically nerfed into oblivion

The experimentation required typically is beyond the time constraints of a casual player and even if they could, once you

drop behind the power curve, typically you cannot access the exploitable content (abilities, classes, mobs, etc) until after

its been nerfed.

How Paretos Principle Can Help Us

The value of the Pareto Principle in management is in reminding us to stay

focused on the 20 percent that matters. Of all the tasks performed throughout the day, one could say (based on Paretos

Principle) that only 20 percent really matter. Those tasks in the 20 percent very likely will produce 80 percent of our

results. Thus, its critical that we identify and focus on those things. When the fire drills surrounding the crisis of theday begin to eat up precious time, remind yourself of the critical 20 percent you need to focus on. If anything in the lis

of activities and action items has to fall by the wayside left undone be sure it isnt listed in that critical 20 percent.

DEFINATIONS OF MAINTAINABILITY :

Maintainability is defined as the probability of restoration of a failed device or equipment or asset to

operational effectiveness with in a specified period of time through the prescribed maintenance operation.

Maintenance can be defined as the characteristic of equipment design and installation which is expressed in

terns of easy and economy of maintenance, availability of equipment, safety and accuracy of performance parameter of

equipment.

Its aim is to design and develop a system or equipment that can be easily maintained at a reasonable cost with minimumresources, without affecting the performance and safety of equipment.

Maintainability is associated with the design of assets to be maintained. It is a measure of the easy of

maintenance, the parameter for expressing the maintainability is Mean Time to Repair (MTTR).

The concept of maintainability is different from reliability. Reliability is the probability that an asset or a system will

operate satisfactory for some determined period of time, the parameter expressing reliability is Failure Rate (FR) OR

Mean Time Between Failures (MTBF).

Maintainability is not a factor - this is a short term tactical solution. It is agreed that the system will be replaced or

rewritten before maintenance costs become a problem. It is particularly important that any decision to build a tactical

solution is documented as there is a tendency for such systems to become long term corporate business systems.

Maintainability is key - this is a long term system which needs to be easily maintained. Here the criterion for

deployment is not just to provide required business functionality in a robust way, but that the design and code meet a

maintainability standard before the system is accepted and released to the business. This means that activities such as

documentation and tested production should not be allowed to be cut out if time runs short in a time box. It could mean

designing the system so that logic is externalized so that it can be easily changed during maintenance activities. This may

mean that time boxes need to be a little longer than usual.

12


13/21

Maintainability will be built in later - the business priority is to elicit and implement required functionality

quickly. The system needs a long life and to be maintainable, but the business is prepared to pay for subsequent (behind

the scenes) re-engineering after implementation. This means a greater development cost than engineering for

maintainability first time, but gives a quicker initial delivery, and may produce a lower lifetime ownership cost than

struggling for years with maintenance problems. (This is often the case where time to market is critical - either in software

for sale into a fast moving market, or software to satisfy a fast moving business).

Maintenance can be said to start after the first increment of a system has been delivered. In most cases, any maintenance

on this will need to be undertaken by the DSDM development team during the second increment. If the system is not

maintainable, then the second increment will be slowed down at this stage. Any requirements not covered during the main

part of the development lifecycle because they were prioritized out due to time boxing or using Moscow are often heldover to be considered for future work done by the maintenance team.

However, maintenance is usually considered to begin post implementation. It should not make any difference whether this

process is undertaken by a separate maintenance team or by the development team. However, maintenance is often

transferred to a different team which means that the goal of maintainability is essential since the maintenance team willnot have gained knowledge of the system during development. It is important that the maintenance team are represented

during the development process - this role is recommended in DSDM.

The Quality Strategy for the project needs to consider how quality control will be applied to ensure that the

Maintainability Objective is met.

Appropriate staff should be involved in implementing maintainability objectives. The Technical Co-ordinator is

responsible for ensuring the maintainability objectives are met. The Project Manager is responsible for identifying and

calling in specialist roles as required - these could be Support and Maintenance team representatives and the Service/Help

Desk Manager to assist with planning for maintainability and to consider the eventual support of this system once it is

running live.

DSDM does not ensure maintainability by itself. Maintainability is made possible by a combination of four factors within

a well managed DSDM project:

Tools - The use of tools to cover such areas as configuration management, testing and impact analysis aids

maintainability People - The people aspects affecting maintainability are development team skills/experience/business

knowledge/user contribution/maintenance team skills/motivation.

Documentation - A minimum documentation set is needed for maintenance. This does not have to be paper-based

documentation but could be information residing in a toolset. This documentation set will vary according to installation

guidelines.

Good practice guidelines - such aspects as standards, style guide, use of DSDM for this installation etc. - in fact

everything that maybe we would have done automatically for a waterfall approach and should not forget just because a

RAD method is being used.

LimitationsIt can not be put in complex machine systems because design considerations and secondly, it does not improve

the performance of the equipment.

Checklist for Planning Maintainability Activities and Resources:

Project team Yes? No? Comments

Does the cross-functional project team include operations/ maintenance?

13


14/21

Is there a designated operations/maintenance personnel for design input and reviews, installation, and start-up?

Who is responsible for complete operations and maintenance documentation such as manuals? Maintainability concepts

Will the project incorporate the following: Preventive maintenance features?

Predictive maintenance?

Accessibility for maintenance?

Safety?

Ease of alignments and quick changeovers? maintenance design considerations, maintenance documenta-

tion, and maintenance training. The checklist aids in assigning accountability for the activities. Additionally, individual

plants have a contact matrix that identifies subject matter experts (primary and secondary contacts) who can assist

project teams in specific areas of maintenance. The combination of the checklist and contact matrix provides users withsufficient tools to plan activities and resources for maintainability implementation.In comparison, the contractor-led program uses a flowchart to plan and integrate maintainability activities into the project

delivery process, shown in Fig. 1. The flowchart helps delin- eate tasks, assigns lead responsibilities and personnel, and

identifies activities such as meetings and conferences. The flowchart is a component of the process coordination methods

employed by the contractor-led program to plan and coordi- nate maintainability activities and resources.

OBJECTIVES

1. To design equipment that can be maintain easily in minimum time and at minimum cost. It implies that the

requirement of other supporting resources such as spare parts, man power, and facilitates of tools and test equipment must

also be minimal.

2. Good maintainability can also improve the safety of personal.3. Maintainability increase the cost of production of any machine and it reduce the operating cost considerably.

4. To reduce the products life cycle cost of maintenance.

5. Good maintainability provisions can help the maintenance department to carry out maintenance successfully with

proper cooperation of the equipment.

6. At the time maintainability implementation program, reliability and other characteristic can be evaluated.

7. its objectives are system readiness and achievements of desired results.

Establish Maintainability Objectives

Maintainability objectives are governed by the maintenance strategy selected for the project. Maintainability objectives

must be clearly defined and must support business goals. While qualitative maintainability objectives are very useful,

preference is for quantitative maintainability objectives that can be measured and recorded. Both programs were able todefine quantitative maintainability objectives. The owner-led program uses quantitative

objectives such as (1) total spare parts inventory per unit of sales value of production; (2) maintenance cost per unit of

sales value of production; (3) maintenance cost per unit of product produced; (4) planned versus unplanned maintenance

cost; (5) start-up costs (training, travel, checkout, materials); (6) annual maintenance costs; and (7) overall equipment ef-

fectiveness. During project planning and design, maintainability objectives are established by a joint effort of the project

engineer and plant/maintenance engineer. The cooperative joint effort increases the likelihood of designed-in maintaina-

bility and realistic attainment of maintainability objectives. In comparison, the contractor-led program identified mean

time to repair as a maintainability objective. Other maintainability objectives were not as clearly defineda common

oc- currence throughout industry. Maintainability results can be difficult to track or measure because maintenance has

initial and long-term impacts occurring over the life cycle of a project. Nonetheless, continuous tracking and

measurement of maintainability objectives provides a means to assess true value and performance. The continuous

assessment also allo informed decisions to be made for appropriate changes to improve long-term maintainability.

Fault Diagnosis

A guess as to whats wrong with a malfunctioning circuit

Narrows the search for physical root cause

Makes inferences based on observed behavior

Usually based on the logical operation of the circuit

Types of Diagnosis

14


15/21

Circuit Partitioning (Effect-Cause Diagnosis)

o Identify fault-free or possibly-faulty portions

o Identify suspect components, logic blocks, interconnects

Model-Based Diagnosis (Cause-Effect Diagnosis)

o Assume one or more specific fault models

o Compare behavior to fault simulations

Circuit Partitioning

Separate known-good portions of circuit from likely areas of failure

Simplest method: identify failing flip-flops

o Tester can identify failing flops or outputs

o Input cone of logic is suspect

o Intersection of multiple cones is highly suspect

o Single clock pulse with scan can be used for sequential/functional fails

aka Effect-Cause Diagnosis

Reasoning based on observed behavior and expected (good-circuit) functions

Commonly used at system and board-levels

Tries to separate good and suspect areas

Advantage: Simple and general

Disadvantage: Not very precise, often gives no indication of defect mechanism

Cause-Effect Diagnosis

Start from possible causes (fault models), compare to observed effects

A simulator is used to predict behavior of the circuit in the presence of various faults

Match prediction(s) against observed behavior

Advantage: Implicates a mechanism as well as a location

Disadvantage: Can be fooled by unmodeled defects

Components of Fault Diagnosis

Fault models

Fault simulators

Fault dictionaries

Diagnosis algorithms

Fault Models

Afault modelis an abstraction of a type of defect behavior

Afault instance is the application of a model to a circuit wire, node, gate, etc.

Used to create and evaluate test sets

For diagnosis, they can be used to simulate and predict faulty behaviors

15


16/21

The most-used fault model (by far)

Simple to simulate and enumerate

Effective for testing, fault grading, and diagnosis of some defects

Many defects are not well represented by the stuck-at model

Stuck-at Fault Model

Shorts are a common defect type in CMOS

Different bridging fault models have varying accuracy and precision, from simplistic to very sophisticated

Difficult or impractical to enumerate

Some Diagnostic Fault Models

Gate Fault

Net Fault

Bridging Fault

Path Fault

Fault Simulators

A fault simulator can simulate instances of a particular fault model

Inputs:

o Circuit (netlist)

o Test set

o Faultlist (list of fault instances)

Output: circuit response

Usually, simulates the presence of a single fault instance (single-fault assumption)

Fault Dictionaries

A fault dictionary is a database of the simulated responses for all faults in faultlist

Used by some diagnosis algorithms for convenience:

o Fast: no simulation at time of diagnosis

o Self-contained: netlist, simulator, and test set not needed after dictionary creation

Can be very large, however!

The Full-Response Dictionary

For each fault (f), store the response to each test vector ( v )

16


17/21

One bit per vector, pass ( 0 ) or fail ( 1 )

For each vector, store the expected output response ( o )

Total storage requirement:f v o bits

The Pass-Fail Dictionary

For each fault, store only the test vector responses

One bit per vector, pass ( 0 ) or fail ( 1 )

Total storage requirement:f v bits Much smaller than full-response, and often practical for even very large circuits

Dynamic Diagnosis

Alternative to dictionary-based diagnosis

Fault simulation is only done for certain faults, based on test results

o Only simulate faults in input cones of failing

flip-flops/outputs

Dictionary is eliminated, but requires complete netlist and test pattern file

Used by most commercial ATPG tools: Mentor Fastscan, Synopsys, Cadence, etc.

Diagnosis Algorithms

Algorithms compare observed behavior to predicted behaviors

An algorithm attempts to explain the observed failures with fault candidates The job of a diagnosis algorithm is to report the best fault candidate(s)

Best is determined by scoring method

Fault Candidate Scoring

Two common scoring methods

o Match/mismatch points

o Fault candidate probability

Other common scorings:

o Hamming distance

o Set intersection/overlap

o Nearest neighbor

Match/mismatch Point Scoring

Award points for matching observed failures

Optionally deduct points for not predicting fails

17


18/21

Nonprediction: A behavior not predicted by candidate

Misprediction: A prediction not fulfilled by behavior

Commercial tools (e.g. Fastscan) are usually biased to lowest nonprediction

Probabilistic Scoring

Probability score based on matches and mismatches and error assumptionso Weights for non- and mis-prediction

o Different prediction probabilities for different fault candidates (bridges vs. stuck-at)

Usually normalized so that total of all candidates equals 1.0

UCSC method uses probabilities to compare stuck-at candidates to bridges in same diagnosis

Types of Diagnosis Algorithms (Cont)

Bridging-fault

o May better represent common CMOS faults

o More complicated fault model

o Biggest problem: candidate selection

Other possible (future) directions:

o Functional fails

o Delay fails

o Parametric failures

Diagnosis in Practice

Using a diagnosis

Translating the results: circuit navigation

Evaluating diagnosis quality

Commercial diagnosis tools

Using a Diagnosis

Fault diagnosis is used to aid physical inspection and root-cause identification

Diagnosis output is logical, not physical:

o Abstract faults (such as stuck-at)

o Gates, ports (nodes), and nets

o No information about location or size

Translation to physical location requires navigation of circuit

Types of Circuit Navigation

Netlist

o Examine RTL (Verilog/VHDL etc) for gates and data paths

Schematic

o Symbolic view of gates and wires

Layout/artwork

o Graphical view of metal lines, poly, vias,

cell boundaries, etc.

Netlist Navigation

18


19/21

Either use text editor on netlist, or use browser function in simulator

Browsers allow you to trace forward and backward and see logic values

Can be used to view hierarchy and functional blocks

Can be tedious

Schematic Navigation

Either hand-drawn (from netlist navigation) or tool-generated gate symbols and wires

Schematic tools in simulators also allow forward and backward traversal and display of logic values Used to verify fault propagation

Does not reflect physical distances

Layout (Artwork) Navigation

Use routing/floorplanning tools to view artwork

Can usually input cell or wire name and tool will highlight the object

Useful for determining (x,y) values

Also good for evaluating physical implications of a set of fault candidates

o Faults clustered in a small area are good

o Faults/nets spread around large die areas are bad

Evaluating a Diagnosis

A diagnosis without one or a few strong (high-scoring) candidates is usually poor

Can indicate:

o Multiple defects

o Unmodeled (complex) behavior

o Inappropriate algorithm

If the diagnosis is poor, either try another algorithm or look for more data (failures)

Evaluating a Diagnosis (cont)

Many diagnoses (~60%) implicate a single stuck-at fault

Usually a good sign, but you must consider equivalent faults

Many defects can mimic a stuck-at fault, without being a short to Vdd or Gnd

Consider nearby nodes also, if practical

Commercial Tool:Mentor Graphics

ATPG tool: Fastscan

Stuck-at diagnosis only

No IDDQ capability

Orders candidates by number of matched failures (biased to lowest non-prediction)

Also has netlist & schematic browser

Based on Waicukauski & Lindbloom (D&T89)

Commercial Tool: Synopsys

ATPG tool: TetraMAX

J. Waicukauski moved to Synopsys after writing Fastscan

19


20/21

Diagnosis capability unknown: assumed to be similar to Fastscan

Commercial Tool: Cadence

ATGP tool: Encounter Test Test and diagnosis tools purchased from IBM

IBM has had good diagnosis research, but Encounters capabilities are unknown

Also of interest: Silicon Ensemble - routing tool

Graphical artwork viewer

Good for highlighting nets and cells based on diagnosis results

Good for determining (x,y) and producing screen shots

Prior Art

Waicukauski & Lindbloom,IEEE Design & Test, Aug. 89

o Most widely-used algorithm for commercial tools

o Finds candidates to match individual tests, attempts to explain all failing tests

Abramovici & Breuer,IEEE Trans. Computing, June 80

o Effect-cause diagnosis

o Permanent stuck-at fault assumption

Aitken & Maxwell,HP Journal, Feb. 95

o Analysis of relative importance of models vs. algorithms

Lavo, Larrabee, et. Al.,Proceedings of ITC 98

o Probabilistic scoring

o Mixed-model diagnosis

Bartenstein et. Al.,Proceedings of ITC 01

o SLAT: Single Location At-a-Time diagnosis

o Focus on matching per-vector results

Prior Art (cont)

Jee & Ferguson,Proceedings of ISTFA 93

o Carafe Inductive Fault Analysis (IFA)

o Examine circuit to determine likely failure locations

Aitken,Proceedings of ITC 95

o Using FIBs to insert defects

o Calibrate/evaluate diagnosis methods

Henderson & Soden,Proceedings of ITC 97

o Probabilistic physical failure analysis

Nigh, Vallett, et. Al.,Proceedings of ITC 98

o Large-scale, multi-company SEMATECH experiment

o Failure analysis of timing and IDDQ fails

Research Directions

Complex defect behaviors

o Beyond stuck-at and 2-line bridges

o Intermittent faults

o Delay and timing-related defects

20


21/21

o Parametric & process-related defects

o Multiple simultaneous defects

o Is there a simple, inductive way to infer complex defects?

Research Directions (cont)

Diagnosibilityo What makes a particular circuit easy or hard to diagnose?

o What can we do to make diagnosis easier?

Evaluation of diagnoses

o What makes a good diagnosis?

o Can we quantify our confidence in a diagnosis?

Research Directions (cont)

Integration with physical FA & yield improvement

o Can we incorporate process information?

o Can we produce a physical diagnosis?

o On-line (or even on-chip) diagnosis

Commercial toolflow integration

o Can diagnosis tools use industry-standard data formats?

o Can commercial tools be scripted or programmed to do better diagnosis?

operational reliability1

Documents