operational reliability1

Upload: akshay-sharma

Post on 14-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Operational Reliability1

    1/21

    RELIABILITYReliability Engg. Is the technology concerned with predictions, controls, continuous improvements in material &

    technology & thus continuous reduction of equipment failure rates. Reliability is different from quality as reliability

    places more emphasis on the activities of design, manufacturing & operation in the field. Reliability is generally, in

    industries, reliability does not necessarily mean failure free operations. Of course, failure free operation is important for

    one shot devices (missies, unmanned space-craft) and non-reliable systems like aircraft, high hazards equipments or lifesaving components etc.

    The concern about reliability can be felt from the comments of an astronaut

    The most nerve-wrecking part of any space flight is the fact that your life depends upon thousands of critical parts each

    produced probably by the lowest bidder.

    In statics or factor analysis the term reliability means:

    The amount of credence placed in a result.

    The precision of a measurement as measured by the variance of repeated measurements of the same objects.

    Engineering reliability is the probability that a product, device or equipment will give failure free performance of its

    intended functions for the required duration of time.

    DESIGN ASPECTS FOR RELIABILITY IMPROVEMENTS FOR INDUSTRIAL EQUIPMENTS ARE:

    1} Massive Over-Design when Weight:

    Space and Cost Limits Permit :

    Using 500tones structural or castings in place of 100tones and so as.

    2} Simplicity and Standardization:

    Less no. of parts increase reliability of equipment or system; using proven, standard components of a supplier instead of

    asking them for special orTailor-made components. This may call for some amount of over design or redesign but may

    prove to be over all cheaper.

    3} Derating of Equipments:

    A 50ton Electric Arc Furnace in Alloy Steel Plant, Durgapur was derated and supplied as 40 ton Furnace; for reliability

    electric motors are often derated.

    4} Human Engineering and Maintability Considerations:

    Making the design in such a way that using incorrectly or fitting incorrectly parts are very difficult.

    Identifying critical components/parts having less reliability and taking necessary actions is also one of the main tasks of

    reliability improvement. 80-20 concept can be applied here also i.e.. 20%of parts amount for 80% of failures/problems

    1

  • 7/30/2019 Operational Reliability1

    2/21

    "Operational Reliability basically not an initiative - but its just a better way of running the business. It changes the way of

    the workforce, thinks, and acts and provides the reliability tools to help them."

    TECHNIQUE FOR IMPROVEMENT OPERATION RELIABILTY:

    Reliability improvement is a continuous engg. Process. It involves enormous amount of data collection (from operating

    equipments and service equipments etc.) . As the failures are of theree tyopes, early faikure, chance failure and wear-out

    failures, their analysis is done in right perspective.

    The following processe are essential in reliability study programme:

    (i) The reliability programme starts in nthe conceptual phase of the product or equipement and continousthrought the design, development, production, testing, field evaluation and service stages etc.

    (ii) Adequate management and orgasnisational support should be there. Involvement of all department units

    that affect reliability, is essential.

    (iii) Proper failure reporting system fron all concerned agencies has to built up. Necessary signal measuring

    devices should be installed and their feedback maonitored

    (iv) Proper action plans, specifying responsibilities, procedures, schedules and budgets (if necessary) to be

    issued and follow up.

    (v) The execution of programme is both technical abd report deviations for taking corrective actions.

    Reliability is a probability that a product, device or equipment will give failure free performance of its intended functions

    for the required duration of time.

    Reliability improvement is a continuous engineering process. It involves enormous amount of data collection and their

    analysis especially with respect to failure modes and stresses etc.s the failure of three types early failure, chance failureand wear out failures, their analysis is done in right perspective.

    IMPROVEMENT OF COMPONENT:

    We can use superior components and parts with low failure rates. However we

    would immediately release that components of high reliability will require more time and money for development. They

    may also be larger in size and weight. Generally objective is not merely to produce a system with highest reliability, but to

    evolve a system which reflects an optimum total cost. The major items contributing to total cost are research and

    development production spares and maintenance. Similarly the production facilitates sufficiently sophisticated to enable

    manufacture of precision components with the result that production cost also would increase with requirement of greater

    reliability on other hand the cost of maintenance and spares would reduce with an increase in reliability factor. The

    objective in the majority of design will be to attain this optimum cost. However the reliability will assume greater

    significance when the goal is not so much the cost but rather the requirement a set mission or for the unit or equipment.

    Reliability Improvement through Redundancy:

    In a system where there are many subsystems and elements, reliability of each element has to

    be improved to near 100% to achieve good system reliability. It has already been mentioned that in a system 400

    2

  • 7/30/2019 Operational Reliability1

    3/21

    elements, each of 98% reliability, the system reliability will come only about 2%. But if the reliability of individual

    elements/components or subassemblies can not be improved further, we can duplicate do triplicate those components to

    improve the system reliability.

    Let us take a simple case of one pump unit, one valve unit and one cylinder in hydraulic system and assume the

    probability of success of each as 70%, 90% and 80% respectively. In a non-redundant system the reliability of the system

    Ps (system) can be shown as:

    Now if we duplicate the pump unit i.e... Add on more pump unit in parallel along with original pump unit the system

    failure on account of the pump unit will occur only when both pump fail. Again assuming the reliability of both pumps Ps

    (p) as 70% i.e.. Probability of the failure of each pump Pf(p) as 30% the system can be shown and system reliability Ps

    (system) can be calculated as given below:

    Therefore Ps (at least one pump) =100%-pf (p1)* pf (p2)

    =100 %-( 30%*30%) =91%

    Therefore (system)=Ps(at least one pump )*Ps(v)*Ps(c)

    =91%*90%*80%

    Pump

    Ps (P) = 70%

    Valve

    Ps (V)90%

    Cylinder Ps (C)

    =80%

    Pump- 1

    Ps (P1) = 70%

    Pf (P1) = 30%

    Cylinder

    Ps (C) = 80%Pump-2

    Ps (P2) = 70%

    Pf (P2) = 30%

    Valve

    Ps (V) = 90%

    3

  • 7/30/2019 Operational Reliability1

    4/21

    =66%, Thus by redundancy, the system reliability can be improved.

    In addition to cost and space limitation there are some additional constraints in reliability through redundancy such as :

    Parallel equipment are some times, connected with charge over switch(for automatic charge over) which may not be fail-

    proof and may introduce another reliability factor.

    With duplication or triplication of components not working failed components may cause adverse effect on working

    components(eg. Possible internal leakage through failed or non working hydraulic valves or pumps which may cause mal-function).

    If the state of the art is such that either it is not possible to produce highly reliable components or the cost of producing

    such components is very high, we can improve the system reliability by the technique of introducing redundancies this

    involve the deliberate creation of the new parallel path in a system.

    There are many methods of introducing redundancies in a system. A few of these will be consider below.

    Stand by redundancy:

    Another type of redundancies that can be introduced in a system is standing by redundancy. A twoelement parallel system used for comparison all the channels or paths are active from the beginning of the operation of the

    system till it failure. In a stand by system all the paths are not active at the same time.

    OPTIMIZATION:

    The reliability of a system can be improved considerably by introducing redundancy either in the sub

    system or in the element. It was also shows that the element or component redundancy is superior to sub system or unit

    redundancy.

    Maintaibility Criteria:

    Executive Summary:

    Faced with shrinking maintenance budgets and increasingly competitive markets, maintainability is an issue of growing

    importance for many companies. Although, maintainability is not a new concept, many companies struggle with

    consistent, standardized maintenance input during the project delivery process. An important characteristic of any design,maintainability pertains to the ease, accuracy, safety, and economy in the performance of maintenance actions. This

    research examines the opportunities available through the effective inclusion of maintainability concepts during the

    project delivery process.

    The Construction Industry Institute (CII) defines maintainability as the optimum use of facility maintenance knowledge

    and experience in the design/engineering of a facility that meets project objectives (Constructability Implementation

    Guide 1993). In this context, maintainability refers to a formal process to include relevant maintenance input during all

    phases of the facility delivery process. The Maintainability Research Team adopted a format similar to constructability forits research methodology and developed model process.

    Research Purpose and Objectives

    The primary purpose of this research is to develop a model process for incorporating maintenanceknowledge and experience into the planning, design, procurement, construction, and start-up of facilities. Specificresearch objectives include: (1) define existing levels of maintainability implementation; (2) identify best practices that

    improve maintainability of capital projects; (3) compile a model process for implementing maintainability; and (4)

    conduct case studies to illustrate best practices and the model process for maintainability implementation.

    Research Scope : Givencomplexity and variations of maintainability, this research focused on general practices to aid in formalization of

    maintainability efforts during the project delivery process. Formal implementation of maintainability is not sufficiently

    mature to obtain quantitative data, and it would be difficult to develop a basis for evaluation. The scope of this

    4

  • 7/30/2019 Operational Reliability1

    5/21

    investigation is limited to maintainability activities during six phases of the project delivery process: (1) planning; (2)

    design; (3) procurement; (4) construction; (5) start-up; and (6) operations and maintenance. This research surveyed a

    broad cross-section of companies engaged in many different types of construction, ranging from general building to

    petrochemical. Capital and retrofit projects for equipment, systems, and facilities were included in this research. As

    maintainability most directly impacts the owner of constructed projects, this research focused on owner organizations.

    Research Methodology : The

    research methodology included: (1) literature review; (2) a questionnaire survey; (3) 35 personal and telephone

    interviews; and (4) seven in-depth case studies with industry representatives.

    Levels of Maintainability Implementation:

    The research data revealed attributes that were subjectively organized into five levels of

    maintainability implementation: (1) design/engineering experience; (2) effective organizational standards; (3) developing

    maintainability process; (4) formal maintainability process; and (5) comprehensive maintainability program. Each level

    expands and refines the attributes of the preceding level, increasing the opportunity for maintainability improvement on

    capital projects.

    Model Process for Maintainability Implementation :

    Best practices observed during the research data collection were organized into a model process

    for maintainability implementation. The model process was developed to provide guidance in the planning, development,

    and implementation of maintainability at both the corporate and project levels. Providing an overview of the

    maintainability program, the model process has six milestones: (1) commit to implementing maintainability; (2) establish

    maintainability program; (3) obtain maintainability capabilities; (4) plan maintainability implementation; (5) implement

    maintainability; and (6) update maintainability program. Each milestone contains several steps and activities that further

    describe the details of implementation.

    Practical Applications: Project-

    specific factors affecting the need for maintainability efforts are grouped into two categories: owner related issues and

    project attributes. The owner related issues are: (1) owner type; (2) past maintenance experience; (3) maintenance

    strategy; and (4) projected cost of maintenance. Project attributes include: (1) construction type; (2) criticality; (3)

    complexity; (4) projected life of facility; and (5) location. Five factors that affect how a formal maintainability process

    will be implemented are: (1) new versus retrofit; (2) project size; (3) project delivery system; (4) maintenance

    organization; and (5) related industry practices.

    Conclusions: Implementation

    of a formal maintainability process involves a fundamental shift in the role of maintenance, from a necessary evil to a value adding activity, in the project delivery process. Maintenance helps achieve and sustain optimum reliability and

    performance for all projects. Formal maintainability programs provide benefits to both owner and contractor

    organizations. Owners benefit from improved control over maintenance costs and improved facility availability. Designers

    and constructors can increase client satisfaction and use success with a maintainability process as a value-adding service

    for owner clients.

    Recommendations: Each

    company must assess the need for maintainability on future projects and then determine the appropriate level of

    maintainability efforts. Development of the formal process should reflect the organizational need, with the purpose ofensuring maintainability objectives are met. A maintainability process has the potential for greatest (and most cost

    effective) impact if it can be integrated with existing company work processes and related improvement initiatives, such

    as Total Quality Management, etc.

    Need for Future Research:

    Future research needs to be conducted in measuring and quantifying costs/benefits of maintainability in order to

    demonstrate the financial aspects of maintainability. Similarly, the need exists to measure and document performance of

    the maintainability process implementation.

    5

  • 7/30/2019 Operational Reliability1

    6/21

    Maintainability Program Plan

    Overview

    The primary purpose of the 'Maintainability Program Plan' is to improve operational readiness, reduce maintenance

    manpower needs, reduce system life cycle cost and provide data essential for management.

    The objective shall be to ensure attainment of the maintainability requirements of the acquisition.

    The maintainability aspect during the systems development is extremely important and it is vital that supplier are aware oftheir responsibilities in this respect as the results can have serious affects for the user. The Maintainability requirements

    must be expressed as definitively as possible. The requirements shall apply to planned maintenance in the support

    environment and shall be expressed in quantitative terms:

    time (e.g., turn around time, time to repair, time between maintenance actions);

    rate (e.g., maintenance hours per operating hours, frequency of preventative maintenance);

    Complexity (e.g., number of people and skill levels, variety of support equipment).

    The expectation of carrying out repairs in the field by substitution of components (e.g., the replacement of a faulty card or

    module in an electronic item) shall be defined.

    Ishikawa diagram

    Definition:

    A graphic tool used to explore and display opinion about sources of variation in a process. (Also called a Cause-and-

    Effect or Fishbone Diagram.)

    Purpose:

    To arrive at a few key sources that contributes most significantly to the problem being examined. These sources are then

    targeted for improvement. The diagram also illustrates the relationships among the wide variety of possible contributors

    to the effect.

    The figure below shows a simple Ishikawa diagram. Note that this tool is referred to by several different names: Ishikawa

    diagram, Cause-and-Effect diagram, Fishbone diagram, and Root Cause Analysis. The first name is after the inventor of

    the tool, Kaoru Ishikawa (1969) who first used the technique in the 1960s.

    6

  • 7/30/2019 Operational Reliability1

    7/21

    The basic concept in the Cause-and-Effect diagram is that the name of a basic problem of interest is entered at the right of

    the diagram at the end of the main "bone". The main possible causes of the problem (the effect) are drawn as bones of

    of the main backbone. The "Four-M" categories are typically used as a starting point: "Materials", "Machines"

    "Manpower", and "Methods". Different names can be chosen to suit the problem at hand, or these general categories can

    be revised. The key is to have three to six main categories that encompass all possible influences. Brainstorming istypically done to add possible causes to the main "bones" and more specific causes to the "bones" on the main "bones".This subdivision into ever increasing specificity continues as long as the problem areas can be further subdivided. The

    practical maximum depth of this tree is usually about four or five levels. When the fishbone is complete, one has a rather

    complete picture of all the possibilities about what could be the root cause for the designated problem.

    The Cause-and-Effect diagram can be used by individuals or teams; probably most effectively by a group. A typical

    utilization is the drawing of a diagram on a blackboard by a team leader who first presents the main problem and asks for

    assistance from the group to determine the main causes which are subsequently drawn on the board as the main bones ofthe diagram. The team assists by making suggestions and, eventually, the entire cause and effect diagram is filled out.

    Once the entire fishbone is complete, team discussion takes place to decide what the most likely root causes of the

    problem are. These causes are circled to indicate items that should be acted upon, and the use of the tool is complete.

    The Ishikawa diagram, like most quality tools, is a visualization and knowledge organization tool. Simply collecting the

    ideas of a group in a systematic way facilitates the understanding and ultimate diagnosis of the problem. Several computer

    tools have been created for assisting in creating Ishikawa diagrams. A tool created by the Japanese Union of Scientists and

    7

  • 7/30/2019 Operational Reliability1

    8/21

    Engineers (JUSE) provides a rather rigid tool with a limited number of bones. Other similar tools can be created using

    various commercial tools.

    Only one tool has been created that adds computer analysis to the fishbone. Bourne et al. (1991) reported using Dempster-

    Shafer theory (Shafer and Logan, 1987) to systematically organize the beliefs about the various causes that contribute to

    the main problem. Based on the idea that the main problem has a total belief of one, each remaining bone has a belief

    assigned to it based on several factors; these include the history of problems of a given bone, events and their causal

    relationship to the bone, and the belief of the user of the tool about the likelihood that any particular bone is the cause of

    the problem.

    How to Construct:

    1. Place the main problem under investigation in a box on the right.

    2. Have the team generate and clarify all the potential sources of variation.

    3. Use an affinity diagram to sort the process variables into naturally related groups. The labels of these groups are the

    names for the major bones on the Ishikawa diagram.

    4. Place the process variables on the appropriate bones of the Ishikawa diagram.

    5. Combine each bone in turn, insuring that the process variables are specific, measurable, and controllable. If they are

    not, branch or "explode" the process variables until the ends of the branches are specific, measurable, and controllable.

    Tip:

    Take care to identify causes rather than symptoms

    Post diagrams to stimulate thinking and get input from other staffSelf-adhesive notes can be used to construct Ishikawa diagrams. Sources of variation can be rearranged to reflect

    appropriate categories with minimal rework

    Insure that the ideas placed on the Ishikawa diagram are process variables, not special caused, other problems

    tampering, etc

    Review the quick fixes and rephrase them, if possible, so that they are process variables.

    8

  • 7/30/2019 Operational Reliability1

    9/21

    References:

    Cause & Effect Diagram:

    The cause & effect diagram is the brainchild of Kaoru Ishikawa, who pioneered quality management processes in theKawasaki shipyards, and in the process became one of the founding fathers of modern management. The cause and effec

    diagram is used to explore all the potential or real causes (or inputs) that result in a single effect (or output). Causes are

    arranged according to their level of importance or detail, resulting in a depiction of relationships and hierarchy of events.

    This can help you search for root causes, identify areas where there may be problems, and compare the relativeimportance of different causes.

    Causes in a cause & effect diagram are frequently arranged into four major categories. While these categories can be

    anything, you will often see:

    Manpower, methods, materials, and machinery (recommended for manufacturing)

    Equipment, policies, procedures, and people (recommended for administration and service).

    These guidelines can be helpful but should not be used if they limit the diagram or are inappropriate. The categories you

    use should suit your needs. At Sky Mark, we often create the branches of the cause and effect tree from the titles of the

    affinity sets in a preceding affinity diagram.

    The C&E diagram is also known as the fishbone diagram because it was drawn to resemble the skeleton of a fish, with the

    main causal categories drawn as "bones" attached to the spine of the fish, as shown below.

    Cause & effect diagrams can also be drawn as tree diagrams, resembling a tree turned on its side. From a single outcome

    or trunk, branches extend that represent major categories of inputs or causes that create that single outcome. These large

    branches then lead to smaller and smaller branches of causes all the way down to twigs at the ends. The tree structure has

    an advantage over the fishbone-style diagram. As a fishbone diagram becomes more and more complex, it becomes

    difficult to find and compare items that are the same distance from the effect because they are dispersed over the diagram.

    With the tree structure, all items on the same causal level are aligned vertically.

    9

  • 7/30/2019 Operational Reliability1

    10/21

    To successfully build a cause and effect diagram:

    1. Be sure everyone agrees on the effect or problem statement before beginning.2. Be succinct.

    3. For each node, think what could be its causes. Add them to the tree.

    4. Pursue each line of causality back to its root cause.

    5. Consider grafting relatively empty branches onto others.

    6. Consider splitting up overcrowded branches.

    7. Consider which root causes are most likely to merit further investigation.

    Other uses for the Cause and Effect tool include the organization diagramming, parts hierarchies, project planning, tree

    diagrams, and the 5 Why's.

    Pareto Chart or Juran Diagram

    A quality tool, also called a Juran diagram, that is based the Pareto Principle, which uses attribute or discrete datawith the

    data arranged in descending order, and with the most occurrences shown first. May use a cumulative line to mark

    percentages for each group or bar, which distinguishes the Pareto Principal or the 80/20 rules that states 20 percent of

    items will cause 80 percent of the problems.

    Principle: the 80/20 Rule

    In the very early 1900s, an Italian economist by the name of Vilfredo Pareto created a mathematical formula

    describing the unequal distribution of wealth he observed and measured in his country: Pareto observed that roughlytwenty percent of the people controlled or owned eighty percent of the wealth. In the late 1940s, Dr. Joseph M. Juran, a

    Quality Management pioneer, attributed the 80/20 Rule to Pareto, calling it Pareto's Principle. While some may claim tha

    Jurans broad attribution of this scientific observation to Pareto is inaccurate, Pareto's Principle or Pareto's Law as it is

    sometimes called, can be a very effective business tool one that can help us manage more effectively.

    The example below is from the Dale H. Besterfield, Ph.D. book, Quality Control Sixth Edition, that includes a CD of

    Excel macros

    10

    http://www.sixsigmaspc.com/dictionary/Pareto-principal.htmlhttp://www.sixsigmaspc.com/dictionary/Pareto-principal.htmlhttp://www.sixsigmaspc.com/dictionary/discreteattributedata.htmlhttp://www.sixsigmaspc.com/dictionary/discreteattributedata.htmlhttp://www.amazon.com/dp/0130256684?tag=zerorejectsforwi&creative=373489&camp=211189&link_code=as3&creativeASIN=0130256684http://www.amazon.com/dp/0130256684?tag=zerorejectsforwi&creative=373489&camp=211189&link_code=as3&creativeASIN=0130256684http://www.sixsigmaspc.com/dictionary/discreteattributedata.htmlhttp://www.amazon.com/dp/0130256684?tag=zerorejectsforwi&creative=373489&camp=211189&link_code=as3&creativeASIN=0130256684http://www.sixsigmaspc.com/dictionary/Pareto-principal.html
  • 7/30/2019 Operational Reliability1

    11/21

    Paint Nonconformities

    Number Category Freq. Percent Cumulative %

    2 Lt. Spray 582 30.9 30.9

    7 Runs 434 23.1 54.0

    3 Drips 227 12.1 66.1

    1 Blister 212 11.3 77.4

    5 Splatter 141 7.5 84.8

    6 Bad Paint 126 6.7 91.5

    4 Overspray 109 5.8 97.3

    8 Other 50 2.7 100.0

    there are a few of issues with this debate that are not considered in the article

    1) The ratio of "casual players" to "power gamer" and the resulting revenue stream they represent

    2) The cost of power gamers on resources (ie bandwidth) vs casual gamers

    3) The average length of subscription between the two demographics

    4) The effects of poor code and game design on the power curve over time. Or in other words it pays to exploit early and

    exploit often. The folks that can get to the broken content first (ie power gamers) get the easy path to victory that is then

    nerfed (typically over-nerfed) so that the casual players need to slog through mind-numbing "content" to achieve the same

    11

  • 7/30/2019 Operational Reliability1

    12/21

    ends.

    Items 1-3 should be a factor in determining which demographic to cater toward...even if the only game that seems to cater

    to the casual player has been...ah hem..."less than fully successful" (Horizons)

    Item 4 makes a mockery (or perhaps strawman) of the "life isn't fair" concept as MMORPG game designers have, as a

    whole, stacked the deck against the casual player. This isn't about "skill" but ability to take advantage of broken game

    mechanics while they exist to get ahead of the power curve and stay there whether it is items in EQ or realm points in

    DAoC or

    Having played EQ with FoH members and then Test Server players and finally a live guild with far above averagerepresentation of Best of the Best winners I can agree that some players are simply better than others. On the other hand,

    they are better because they understand the underlying game mechanics better than the average player and uses (or

    exploits) them to their maximum benefit. Tactics and strategies that once known to the general populace (and thereby the

    devs) are typically nerfed into oblivion

    The experimentation required typically is beyond the time constraints of a casual player and even if they could, once you

    drop behind the power curve, typically you cannot access the exploitable content (abilities, classes, mobs, etc) until after

    its been nerfed.

    How Paretos Principle Can Help Us

    The value of the Pareto Principle in management is in reminding us to stay

    focused on the 20 percent that matters. Of all the tasks performed throughout the day, one could say (based on Paretos

    Principle) that only 20 percent really matter. Those tasks in the 20 percent very likely will produce 80 percent of our

    results. Thus, its critical that we identify and focus on those things. When the fire drills surrounding the crisis of theday begin to eat up precious time, remind yourself of the critical 20 percent you need to focus on. If anything in the lis

    of activities and action items has to fall by the wayside left undone be sure it isnt listed in that critical 20 percent.

    DEFINATIONS OF MAINTAINABILITY :

    Maintainability is defined as the probability of restoration of a failed device or equipment or asset to

    operational effectiveness with in a specified period of time through the prescribed maintenance operation.

    Maintenance can be defined as the characteristic of equipment design and installation which is expressed in

    terns of easy and economy of maintenance, availability of equipment, safety and accuracy of performance parameter of

    equipment.

    Its aim is to design and develop a system or equipment that can be easily maintained at a reasonable cost with minimumresources, without affecting the performance and safety of equipment.

    Maintainability is associated with the design of assets to be maintained. It is a measure of the easy of

    maintenance, the parameter for expressing the maintainability is Mean Time to Repair (MTTR).

    The concept of maintainability is different from reliability. Reliability is the probability that an asset or a system will

    operate satisfactory for some determined period of time, the parameter expressing reliability is Failure Rate (FR) OR

    Mean Time Between Failures (MTBF).

    Maintainability is not a factor - this is a short term tactical solution. It is agreed that the system will be replaced or

    rewritten before maintenance costs become a problem. It is particularly important that any decision to build a tactical

    solution is documented as there is a tendency for such systems to become long term corporate business systems.

    Maintainability is key - this is a long term system which needs to be easily maintained. Here the criterion for

    deployment is not just to provide required business functionality in a robust way, but that the design and code meet a

    maintainability standard before the system is accepted and released to the business. This means that activities such as

    documentation and tested production should not be allowed to be cut out if time runs short in a time box. It could mean

    designing the system so that logic is externalized so that it can be easily changed during maintenance activities. This may

    mean that time boxes need to be a little longer than usual.

    12

  • 7/30/2019 Operational Reliability1

    13/21

    Maintainability will be built in later - the business priority is to elicit and implement required functionality

    quickly. The system needs a long life and to be maintainable, but the business is prepared to pay for subsequent (behind

    the scenes) re-engineering after implementation. This means a greater development cost than engineering for

    maintainability first time, but gives a quicker initial delivery, and may produce a lower lifetime ownership cost than

    struggling for years with maintenance problems. (This is often the case where time to market is critical - either in software

    for sale into a fast moving market, or software to satisfy a fast moving business).

    Maintenance can be said to start after the first increment of a system has been delivered. In most cases, any maintenance

    on this will need to be undertaken by the DSDM development team during the second increment. If the system is not

    maintainable, then the second increment will be slowed down at this stage. Any requirements not covered during the main

    part of the development lifecycle because they were prioritized out due to time boxing or using Moscow are often heldover to be considered for future work done by the maintenance team.

    However, maintenance is usually considered to begin post implementation. It should not make any difference whether this

    process is undertaken by a separate maintenance team or by the development team. However, maintenance is often

    transferred to a different team which means that the goal of maintainability is essential since the maintenance team willnot have gained knowledge of the system during development. It is important that the maintenance team are represented

    during the development process - this role is recommended in DSDM.

    The Quality Strategy for the project needs to consider how quality control will be applied to ensure that the

    Maintainability Objective is met.

    Appropriate staff should be involved in implementing maintainability objectives. The Technical Co-ordinator is

    responsible for ensuring the maintainability objectives are met. The Project Manager is responsible for identifying and

    calling in specialist roles as required - these could be Support and Maintenance team representatives and the Service/Help

    Desk Manager to assist with planning for maintainability and to consider the eventual support of this system once it is

    running live.

    DSDM does not ensure maintainability by itself. Maintainability is made possible by a combination of four factors within

    a well managed DSDM project:

    Tools - The use of tools to cover such areas as configuration management, testing and impact analysis aids

    maintainability People - The people aspects affecting maintainability are development team skills/experience/business

    knowledge/user contribution/maintenance team skills/motivation.

    Documentation - A minimum documentation set is needed for maintenance. This does not have to be paper-based

    documentation but could be information residing in a toolset. This documentation set will vary according to installation

    guidelines.

    Good practice guidelines - such aspects as standards, style guide, use of DSDM for this installation etc. - in fact

    everything that maybe we would have done automatically for a waterfall approach and should not forget just because a

    RAD method is being used.

    LimitationsIt can not be put in complex machine systems because design considerations and secondly, it does not improve

    the performance of the equipment.

    Checklist for Planning Maintainability Activities and Resources:

    Project team Yes? No? Comments

    Does the cross-functional project team include operations/ maintenance?

    13

  • 7/30/2019 Operational Reliability1

    14/21

    Is there a designated operations/maintenance personnel for de- sign input and reviews, installation, and start-up?

    Who is responsible for complete operations and maintenance documentation such as manuals? Maintainability concepts

    Will the project incorporate the following: Preventive maintenance features?

    Predictive maintenance?

    Accessibility for maintenance?

    Safety?

    Ease of alignments and quick changeovers? maintenance design considerations, maintenance documenta-

    tion, and maintenance training. The checklist aids in assigning accountability for the activities. Additionally, individual

    plants have a contact matrix that identifies subject matter experts (pri- mary and secondary contacts) who can assist

    project teams in specific areas of maintenance. The combination of the check- list and contact matrix provides users withsufficient tools to plan activities and resources for maintainability implementa- tion.In comparison, the contractor-led program uses a flowchart to plan and integrate maintainability activities into the project

    delivery process, shown in Fig. 1. The flowchart helps delin- eate tasks, assigns lead responsibilities and personnel, and

    identifies activities such as meetings and conferences. The flowchart is a component of the process coordination methods

    employed by the contractor-led program to plan and coordi- nate maintainability activities and resources.

    OBJECTIVES

    1. To design equipment that can be maintain easily in minimum time and at minimum cost. It implies that the

    requirement of other supporting resources such as spare parts, man power, and facilitates of tools and test equipment must

    also be minimal.

    2. Good maintainability can also improve the safety of personal.3. Maintainability increase the cost of production of any machine and it reduce the operating cost considerably.

    4. To reduce the products life cycle cost of maintenance.

    5. Good maintainability provisions can help the maintenance department to carry out maintenance successfully with

    proper cooperation of the equipment.

    6. At the time maintainability implementation program, reliability and other characteristic can be evaluated.

    7. its objectives are system readiness and achievements of desired results.

    Establish Maintainability Objectives

    Maintainability objectives are governed by the maintenance strategy selected for the project. Maintainability objectives

    must be clearly defined and must support business goals. While qualitative maintainability objectives are very useful,

    preference is for quantitative maintainability objectives that can be measured and recorded. Both programs were able todefine quantitative maintain- ability objectives. The owner-led program uses quantitative

    objectives such as (1) total spare parts inventory per unit of sales value of production; (2) maintenance cost per unit of

    sales value of production; (3) maintenance cost per unit of product produced; (4) planned versus unplanned maintenance

    cost; (5) start-up costs (training, travel, checkout, materials); (6) annual maintenance costs; and (7) overall equipment ef-

    fectiveness. During project planning and design, maintainabil- ity objectives are established by a joint effort of the project

    engineer and plant/maintenance engineer. The cooperative joint effort increases the likelihood of designed-in maintaina-

    bility and realistic attainment of maintainability objectives. In comparison, the contractor-led program identified mean

    time to repair as a maintainability objective. Other maintain- ability objectives were not as clearly defineda common

    oc- currence throughout industry. Maintainability results can be difficult to track or measure because maintenance has

    initial and long-term impacts occurring over the life cycle of a proj- ect. Nonetheless, continuous tracking and

    measurement of maintainability objectives provides a means to assess true value and performance. The continuous

    assessment also allo informed decisions to be made for appropriate changes to im- prove long-term maintainability.

    Fault Diagnosis

    A guess as to whats wrong with a malfunctioning circuit

    Narrows the search for physical root cause

    Makes inferences based on observed behavior

    Usually based on the logical operation of the circuit

    Types of Diagnosis

    14

  • 7/30/2019 Operational Reliability1

    15/21

    Circuit Partitioning (Effect-Cause Diagnosis)

    o Identify fault-free or possibly-faulty portions

    o Identify suspect components, logic blocks, interconnects

    Model-Based Diagnosis (Cause-Effect Diagnosis)

    o Assume one or more specific fault models

    o Compare behavior to fault simulations

    Circuit Partitioning

    Separate known-good portions of circuit from likely areas of failure

    Simplest method: identify failing flip-flops

    o Tester can identify failing flops or outputs

    o Input cone of logic is suspect

    o Intersection of multiple cones is highly suspect

    o Single clock pulse with scan can be used for sequential/functional fails

    aka Effect-Cause Diagnosis

    Reasoning based on observed behavior and expected (good-circuit) functions

    Commonly used at system and board-levels

    Tries to separate good and suspect areas

    Advantage: Simple and general

    Disadvantage: Not very precise, often gives no indication of defect mechanism

    Cause-Effect Diagnosis

    Start from possible causes (fault models), compare to observed effects

    A simulator is used to predict behavior of the circuit in the presence of various faults

    Match prediction(s) against observed behavior

    Advantage: Implicates a mechanism as well as a location

    Disadvantage: Can be fooled by unmodeled defects

    Components of Fault Diagnosis

    Fault models

    Fault simulators

    Fault dictionaries

    Diagnosis algorithms

    Fault Models

    Afault modelis an abstraction of a type of defect behavior

    Afault instance is the application of a model to a circuit wire, node, gate, etc.

    Used to create and evaluate test sets

    For diagnosis, they can be used to simulate and predict faulty behaviors

    15

  • 7/30/2019 Operational Reliability1

    16/21

    The most-used fault model (by far)

    Simple to simulate and enumerate

    Effective for testing, fault grading, and diagnosis of some defects

    Many defects are not well represented by the stuck-at model

    Stuck-at Fault Model

    Shorts are a common defect type in CMOS

    Different bridging fault models have varying accuracy and precision, from simplistic to very sophisticated

    Difficult or impractical to enumerate

    Some Diagnostic Fault Models

    Gate Fault

    Net Fault

    Bridging Fault

    Path Fault

    Fault Simulators

    A fault simulator can simulate instances of a particular fault model

    Inputs:

    o Circuit (netlist)

    o Test set

    o Faultlist (list of fault instances)

    Output: circuit response

    Usually, simulates the presence of a single fault instance (single-fault assumption)

    Fault Dictionaries

    A fault dictionary is a database of the simulated responses for all faults in faultlist

    Used by some diagnosis algorithms for convenience:

    o Fast: no simulation at time of diagnosis

    o Self-contained: netlist, simulator, and test set not needed after dictionary creation

    Can be very large, however!

    The Full-Response Dictionary

    For each fault (f), store the response to each test vector ( v )

    16

  • 7/30/2019 Operational Reliability1

    17/21

    One bit per vector, pass ( 0 ) or fail ( 1 )

    For each vector, store the expected output response ( o )

    Total storage requirement:f v o bits

    The Pass-Fail Dictionary

    For each fault, store only the test vector responses

    One bit per vector, pass ( 0 ) or fail ( 1 )

    Total storage requirement:f v bits Much smaller than full-response, and often practical for even very large circuits

    Dynamic Diagnosis

    Alternative to dictionary-based diagnosis

    Fault simulation is only done for certain faults, based on test results

    o Only simulate faults in input cones of failing

    flip-flops/outputs

    Dictionary is eliminated, but requires complete netlist and test pattern file

    Used by most commercial ATPG tools: Mentor Fastscan, Synopsys, Cadence, etc.

    Diagnosis Algorithms

    Algorithms compare observed behavior to predicted behaviors

    An algorithm attempts to explain the observed failures with fault candidates The job of a diagnosis algorithm is to report the best fault candidate(s)

    Best is determined by scoring method

    Fault Candidate Scoring

    Two common scoring methods

    o Match/mismatch points

    o Fault candidate probability

    Other common scorings:

    o Hamming distance

    o Set intersection/overlap

    o Nearest neighbor

    Match/mismatch Point Scoring

    Award points for matching observed failures

    Optionally deduct points for not predicting fails

    17

  • 7/30/2019 Operational Reliability1

    18/21

    Nonprediction: A behavior not predicted by candidate

    Misprediction: A prediction not fulfilled by behavior

    Commercial tools (e.g. Fastscan) are usually biased to lowest nonprediction

    Probabilistic Scoring

    Probability score based on matches and mismatches and error assumptionso Weights for non- and mis-prediction

    o Different prediction probabilities for different fault candidates (bridges vs. stuck-at)

    Usually normalized so that total of all candidates equals 1.0

    UCSC method uses probabilities to compare stuck-at candidates to bridges in same diagnosis

    Types of Diagnosis Algorithms (Cont)

    Bridging-fault

    o May better represent common CMOS faults

    o More complicated fault model

    o Biggest problem: candidate selection

    Other possible (future) directions:

    o Functional fails

    o Delay fails

    o Parametric failures

    Diagnosis in Practice

    Using a diagnosis

    Translating the results: circuit navigation

    Evaluating diagnosis quality

    Commercial diagnosis tools

    Using a Diagnosis

    Fault diagnosis is used to aid physical inspection and root-cause identification

    Diagnosis output is logical, not physical:

    o Abstract faults (such as stuck-at)

    o Gates, ports (nodes), and nets

    o No information about location or size

    Translation to physical location requires navigation of circuit

    Types of Circuit Navigation

    Netlist

    o Examine RTL (Verilog/VHDL etc) for gates and data paths

    Schematic

    o Symbolic view of gates and wires

    Layout/artwork

    o Graphical view of metal lines, poly, vias,

    cell boundaries, etc.

    Netlist Navigation

    18

  • 7/30/2019 Operational Reliability1

    19/21

    Either use text editor on netlist, or use browser function in simulator

    Browsers allow you to trace forward and backward and see logic values

    Can be used to view hierarchy and functional blocks

    Can be tedious

    Schematic Navigation

    Either hand-drawn (from netlist navigation) or tool-generated gate symbols and wires

    Schematic tools in simulators also allow forward and backward traversal and display of logic values Used to verify fault propagation

    Does not reflect physical distances

    Layout (Artwork) Navigation

    Use routing/floorplanning tools to view artwork

    Can usually input cell or wire name and tool will highlight the object

    Useful for determining (x,y) values

    Also good for evaluating physical implications of a set of fault candidates

    o Faults clustered in a small area are good

    o Faults/nets spread around large die areas are bad

    Evaluating a Diagnosis

    A diagnosis without one or a few strong (high-scoring) candidates is usually poor

    Can indicate:

    o Multiple defects

    o Unmodeled (complex) behavior

    o Inappropriate algorithm

    If the diagnosis is poor, either try another algorithm or look for more data (failures)

    Evaluating a Diagnosis (cont)

    Many diagnoses (~60%) implicate a single stuck-at fault

    Usually a good sign, but you must consider equivalent faults

    Many defects can mimic a stuck-at fault, without being a short to Vdd or Gnd

    Consider nearby nodes also, if practical

    Commercial Tool:Mentor Graphics

    ATPG tool: Fastscan

    Stuck-at diagnosis only

    No IDDQ capability

    Orders candidates by number of matched failures (biased to lowest non-prediction)

    Also has netlist & schematic browser

    Based on Waicukauski & Lindbloom (D&T89)

    Commercial Tool: Synopsys

    ATPG tool: TetraMAX

    J. Waicukauski moved to Synopsys after writing Fastscan

    19

  • 7/30/2019 Operational Reliability1

    20/21

    Diagnosis capability unknown: assumed to be similar to Fastscan

    Commercial Tool: Cadence

    ATGP tool: Encounter Test Test and diagnosis tools purchased from IBM

    IBM has had good diagnosis research, but Encounters capabilities are unknown

    Also of interest: Silicon Ensemble - routing tool

    Graphical artwork viewer

    Good for highlighting nets and cells based on diagnosis results

    Good for determining (x,y) and producing screen shots

    Prior Art

    Waicukauski & Lindbloom,IEEE Design & Test, Aug. 89

    o Most widely-used algorithm for commercial tools

    o Finds candidates to match individual tests, attempts to explain all failing tests

    Abramovici & Breuer,IEEE Trans. Computing, June 80

    o Effect-cause diagnosis

    o Permanent stuck-at fault assumption

    Aitken & Maxwell,HP Journal, Feb. 95

    o Analysis of relative importance of models vs. algorithms

    Lavo, Larrabee, et. Al.,Proceedings of ITC 98

    o Probabilistic scoring

    o Mixed-model diagnosis

    Bartenstein et. Al.,Proceedings of ITC 01

    o SLAT: Single Location At-a-Time diagnosis

    o Focus on matching per-vector results

    Prior Art (cont)

    Jee & Ferguson,Proceedings of ISTFA 93

    o Carafe Inductive Fault Analysis (IFA)

    o Examine circuit to determine likely failure locations

    Aitken,Proceedings of ITC 95

    o Using FIBs to insert defects

    o Calibrate/evaluate diagnosis methods

    Henderson & Soden,Proceedings of ITC 97

    o Probabilistic physical failure analysis

    Nigh, Vallett, et. Al.,Proceedings of ITC 98

    o Large-scale, multi-company SEMATECH experiment

    o Failure analysis of timing and IDDQ fails

    Research Directions

    Complex defect behaviors

    o Beyond stuck-at and 2-line bridges

    o Intermittent faults

    o Delay and timing-related defects

    20

  • 7/30/2019 Operational Reliability1

    21/21

    o Parametric & process-related defects

    o Multiple simultaneous defects

    o Is there a simple, inductive way to infer complex defects?

    Research Directions (cont)

    Diagnosibilityo What makes a particular circuit easy or hard to diagnose?

    o What can we do to make diagnosis easier?

    Evaluation of diagnoses

    o What makes a good diagnosis?

    o Can we quantify our confidence in a diagnosis?

    Research Directions (cont)

    Integration with physical FA & yield improvement

    o Can we incorporate process information?

    o Can we produce a physical diagnosis?

    o On-line (or even on-chip) diagnosis

    Commercial toolflow integration

    o Can diagnosis tools use industry-standard data formats?

    o Can commercial tools be scripted or programmed to do better diagnosis?