reliability engineering ppt

Reliability engineering

Definitions Reliability - The ability of an item to

perform a required function under stated conditions for a stated period of time. It is usually denoted as probability or as a success .

Failure – The termination of ability of an item to perform a required function.

Observed Failure Rate – For a stated period in life of an item, the ratio of the total number of failures in a sample to the cumulative of the time on that sample. The observed failure rate is associated with particular and stated time intervals(or summation of intervals) in the life of the item and under stated conditions.

Observed Mean Time Between Failures(MTBF) – For a stated period in the life of an item, the mean value of the length of time between consecutive failures computed as the ratio of the cumulative observed time to the number failures under stated conditions.

Observed mean time to failure (MTTF)-

For a stated period in the life of an item, the ratio of the cumulative time for a sample to the total number of failure in the sample during the period under stated condition

Name Definition

Guarantee An assurance given by the manufacturer to the vendor that the product will work without failure for a stated period of time

Warranty A written guarantee given to the purchaser of a new appliance, automobile, or other item by the manufacturer or dealer, usually specifying that the manufacturer will make any repairs or replace defective parts free of charge for a stated period of time.

Maintainability The measure of the ability of an item to be retained in or retained in or restored to a specified condition when maintenance is performed by personnel having specified skill levels, using prescribed procedures and resourcesApplies to a major tasks where many repetitions are expected and where considerable time is required

Availability A tool for measuring the percent of time an item or system is in a state of readiness where it is operable and can be committed to use when called upon. Availability ceases because of a downing event that causes the item/to system become unavailable to initiate a mission when called upon

Availability=MTBF/(MTBF+MTTR)

Reliability The ability of an item to perform a required function under stated conditions for a stated period of time. It is usually denoted as probability or as a success .

Guarantee Warranty Maintainability Availability Reliability

An assurance given by the manufacturer to the vendor that the product will work without failure for a stated period of time

A written guarantee given to the purchaser of a new appliance, automobile, or other item by the manufacturer or dealer, usually specifying that the manufacturer will make any repairs or replace defective parts free of charge for a stated period of time.

The measure of the ability of an item to be retained in or retained in or restored to a specified condition when maintenance is performed by personnel having specified skill levels, using prescribed procedures and resourcesApplies to a major tasks where many repetitions are expected and where considerable time is required

A tool for measuring the percent of time an item or system is in a state of readiness where it is operable and can be committed to use when called upon. Availability ceases because of a downing event that causes the item to become unavailable to initiate a mission when called upon

Availability=MTBF/(MTBF+MTTR)

The ability of an item to perform a required function under stated conditions for a stated period of time. It is usually denoted as probability or as a success .

Why engineering items failed? The design might be inherently incapable, the

more complex the design ,more the difficult to overcome the problem

The item might be overstressed in some way Failures can be caused by wear out. Sufficiently

strong at the start of the life and become weaker with age

Failures can be caused by other time dependent mechanism such as battery run down, creep in turbine caused simultaneously by high temperature and tensile stress

Failures can be caused by sneaks . Sneak is the condition in which the system does not work properly even though every part does

Failures can be caused by errors such as incorrect specification, design ,fault assembly or test

There are many other potential causes to failure such as oil leaks noisy ,display flickering etc.

Knowing ,as far as is practicable, the potential causes of failures is fundamental to preventing them

Failures might be caused by variation

What is reliability engineering Manufacturers often suffer high costs of failure

under warranty Reliability is usually concerned with failures in

the time domain. This distinction marks the difference between traditional quality control and reliability engineering

Whether failures occur or not and their times to occurrence can seldom be forecast accurately .reliability is therefore an aspect of engineering uncertainty

Whether an item will work for a particular period is a question which can be answered as a probability.

Ultimately reliability engineering is effective management of engineering

Need for Reliability

Non-Repairable items Reliability is the survival probability over the items

expected life ,or for a period during its life, when only one failure can occur

The instantaneous probability of the first and only failure is called hazard rate

MTTF , the expected life by which a certain percentage might have failed is used here.

The non repairable parts may be individual parts such as bulb, transistor or systems comprised of many parts such as spacecraft, microprocessor

When a part fails in a non repairable system, the system fails, hence the reliability is function of the time to the first part failure

Repairable items Reliability is the probability that the failure will

not occur in the period of interest, when more than one failure can occur .

It can also be expressed as failure rate or the rate of occurrence of failures

Reliability is characterized by MTBF, but only under the particular condition of a constant failure rate

In a repairable system which contains which contains a part type ,the part will contribute by that amount to the system failure rate

Bath tub curve

What: the concept is derived from the human life experience involving infant mortality, chances of failures, plus a wear out period of life since data for births and deaths is accumulated by government agencies. Most equipment lacks the birth/death recording by govt. and most non-human systems can be regenerated to live/die many times before relegation to the scrap heap

Bath Tub Curve

Why: failures rate are different for both people and equipment at different phase of operation and the medicine to be applied to both humans and equipment need to be considered for effectively treating the roots of the problem

Mean Median Mode

The sample mean can be used to estimate the population mean , which is the average of all possible outcomes

It is the measure of the central tendency, which is the mid point of the distribution It is the point at which half the measured values fall to either side

It is the value at which the distribution peaks.

Distribution Plots

Parametric Analysis Parametric Analysis is fitting the data to a known

distribution and estimating the parameters of the distribution.

Parametric Analysis is done by using two most commonly used methods :

-Regression Analysis -Most Likelihood Method Having got a fit, a statistic is calculated to estimate

the goodness of the fit after which a confidence interval of the parameters can be found.

Regression Analysis Most commonly used continuous distribution

are - Weibull Distribution - Normal Distribution - Lognormal Distribution - Exponential Distribution First we linearize the basic CDF by making the

required transformation. From that we find parameters of the distribution.

Linearized Formulae for Weibull Distribution Xi=ln(ti) Yi=ln ln[1/( 1-F(ti) )] where F(ti) is Cumulative Failure Function F(ti)= (i-0.3)/(n+4) (For ith failure out of n components) β= Slope η = exp(-abs[intercept]/ β)

A straight line is fitted using the X and Y data points by minimizing the sum of squares of the distance of the data points from the fitted line. The distance can be in vertical or horizontal direction.

There is a correlation coefficient, referred to as r whose values varies from -1 to 1. The more the value of r^2 reaches 1 the more linear is the relation between X and Y.

To see the complete solution click here

Most Likelihood Method(MLE)

It also helps in estimating the parameters of distribution.

It does that by defining a likelihood function which is function of parameters of the distribution.

The Likelihood function is maximized to find the parameters of the distribution.

Life Testing Data Types Used for MLE Estimates

Life Testing

TYPE ITime

Terminated

With Replacement

Without Replacement

TYPE IIFailure

Terminated

With Replacement

Without Replacement

MLE Weibull Parameter Estimation

/1

1

1

1

1

)(1

0ln11

)(

ln)(ln)(

r

isi

r

iir

isi

r

issii

trntr

tr

trnt

ttrnttg

ts = 1 For Complete Data =Test time For TYPE I Data = tr For TYPE II Data

ti is time taken for ith failure r is the number of failures n is total number of components Find β for g(β)=0 Substitute that in second equation

and find η

For Completely Solved Solution Click Here

Goodness Of Fit (GOF) r^2 value in the case of Regression

analysis is used to find goodness of fit. For MLE we use the following GOF

statistic.• Chi-Square Method

• Kolmogorov-Smirnov Test Often data would fit many distribution.

Hence we have to find GOF so as to find the perfect distribution.

Chi Square Test Applicable to all distributions having

large sample size.

Applied to both discrete and continuous data.

The probabilities are based on null hypothesis

Formula Used Χ^2 = where k=number of classes Oi=Observed number of failures in ith class Ei=Expected number of failures in ith class n= Sample Size

Degrees of Freedom=k-1-number of estimated parameters

Example

There are 35 failure times listed below. Check if the distribution follows exponential distribution.

GIVEN DATA

1476 300 98 221 157

182 499 552 1563 36

246 442 20 796 31

47 438 400 279 247

210 284 553 767 1297

214 428 597 2025 185

467 401 210 289 1024

Group the result in specific bounds

Upper Bound Number of failure times observed in

that bound

350 18

750 10

2026 7

Cumulative Failure Function, F(ti)=1-exp(-λti)

for exponential distribution. Thus expected number of failures in the bound is given by E(ti)=number of components*expected failure(F(ti))Let λ=0.00206 E1=35*(1-exp(-350*0.00206))=17.98 E2=35*(1-exp(-350*0.00206)-P1)=9.55 E3=35*(1-P2-P1)=7.47

From the formula, we find the value of χ^2

Degree of Freedoms, k=3-1-1=1

From the Statistic table for Chi-Square we get, for k=1 and χ^2=0.0496, α is between 10% to 20% (α should be less than 90%). Hence, the Null Hypothesis is accepted. Thus, we can say that the distribution is Exponential.

For completely solved solution click here

Kolmogorov-Smirnov Test It is also used to find the GOF but that it

can be used even to small sample size.

Formulae Used Sn(tn)=0 For -∞<t1 =i/n For ti<t< ∞;i=1,2….n-1 =1 For tn<t<∞

K – S = max(|F(ti)-Sn(ti)|,|F(ti)-Sn(ti-1)|) Where F(ti) is Cumulative failure of the distribution ti is the Time taken for ith Failure n is sample size

Example The following 14 observations are on the

failure time of a component in hours. Test the hypothesis that the failure time is normal.

For normal distribution,z= (x-μ)/σ where μ is the mean σ is the standard deviation

Cumulative Failure Function, F(ti)=(1/σ √2)℮^(-0.5)[(x- μ)/σ]^2

i TTF i TTF

1 61.6 8 72.7

2 63.4 9 73

3 65.1 10 75.3

4 65.5 11 77.1

5 70 12 78.4

6 72.3 13 83.2

7 72.5 14 83.5

GIVEN DATA

For Completely Solved Solution Click BelowK-S Test.xlsx

Reliability Block Diagram Systems are composed of components RBD is a method of evaluating the

reliability of the system by the establishing following relationship

Series Parallel

Combination of both These structure helps in understanding

logic relationship

Series configuration

Failure of any one component in the block will lead to the failure of the entire system

Rs - system reliability E1 - event where component 1 does not fail E2 - event where component 2 does not fail R1 - reliability of component 1 R2 – reliability of component 2

1 2 n

FormulaRs = P(E1 E2 ) = P(E1) P(E2) = R1 (R2 )Therefore the system reliability must be greater than the individual component reliabilityi.e. All component's must have high reliability in this configuration

Parallel configuration In a parallel system all elements must

fail for the system to fail

1

2

n

formulaRS=1-(1-R1)(1-R2)GeneralizingRs=1- [1- Ri (t) ]

Combination of parallel and series

Example If R1=R2 =0.90,R3=R6=0.98,R4=R5=0.99

considering as constant failure rateSolution:Ra=1-(0.10)^2Rb=[1-(0.10)^2](0.98) =0.9702Rc=(0.99)^2 =0.9801& Rs=[1-(1-0.9702)(1-0.98)](0.98) =0.9794

FAULT TREE ANALYSIS An undesired event is defined The event is resolved into its immediate

causes This resolution of events continues until

basic causes are identified A logical diagram called a fault tree is

constructed showing the logical event relationships

ELEMENTS FTA is a deductive analysis approach for

resolving an undesired event into its causes FTA is a backward looking analysis, looking

backward at the causes of a given event Specific stepwise logic is used in the process Specific logic symbols are used to to

illustrate the event relationships A logic diagram is constructed showing the

event relationships.

USES FTA is used to resolve the causes of system

failure FTA is used to quantify system failure probability FTA is used to evaluate potential upgrades to a

system FTA is used to optimize resources in assuring

system safety FTA is used to resolve causes of an incident FTA is used to model system failures in risk

assessments

FOUR STEPS1. Define the undesired event to be analyzed (the focus of the FTA)2. Define the boundary of the system (the scope of the FTA)3. Define the basic causal events to be considered (the resolution of the FTA)4. Define the initial state of the system

BASIC EVENTS

BASIC GATES

Example

Specifications Undesired top event: Motor does not

start when switch is closed Boundary of the FT: The circuit

containing the motor, battery, and switch Resolution of the FT: The basic

components in the circuit excluding the wiring

Initial State of System: Switch open, normal operating conditions

Fault tree

The Top Event of the Fault Tree The top event should describe WHAT the

event is and WHEN it happens The top event is the specific event to be

resolved into its basic causes EX: 1. Fuel Supply System Fails to Shutoff after the fueling phase 2. Launch Vehicle Fails to Ignite at Launch

OR gate The OR Gate represents the logical union of the

inputs: the output occurs if any of the inputs occur

The OR gate is used when an event is resolved into more specific causes or scenarios

The OR gate is used when a component failure is resolved into an inherent failure or a command failure

The OR gate is used when an event is described in terms of equivalent, more specific events

AND gate The AND Gate represents the logical

intersection of the inputs, the output occurs if all of the inputs occur

The OR gate is used when an event is resolved into combinations of events that need to occur

The AND gate is used when a redundant system is resolved into multiple subsystems that need to fail

The AND gate is used when a system failure is resolved into conditions and events needed to occur

Developing FTA1.Define the top event as a rectangle 2.Determine the immediate necessary and sufficient events which result in the top event 3.Draw the appropriate gate to describe the logic for the intermediate events resulting in the top event 4. Treat each intermediate event as an intermediate level top event 5. Determine the immediate, necessary and sufficient causes for each intermediate event 6. Determine the appropriate gate and continue the process

Key attributes Top Event-What specific event is being

analyzed? Boundary-What is inside and outside the

analysis? Resolution-What are the primary causes

to be resolved to? Initial State-What is assumed for the

initial conditions and states?

FAULT VS FAILURE•The intermediate events in a fault tree are called faults •The basic events, or primary events , are called failures if they represent failures of components •It is important is to clearly define each event as a fault or failure so it can be further resolved or be identified as a basic cause*Write the statements that are entered in the event boxes as faults; state precisely what the fault is and the conditions under which it occurs. Do not mix successes with faults*

Petri nets A petri nets is general purpose graphical and

mathematical tool describing relations existing between conditions and events. The basic symbol of petri nets include

: place , denotes events : immediate transition , denotes event transfer with no delay : timed transition , denotes event transfer the period of tie delay : arc, between places and transitions : token, contained in places , denotes the data : inhibitor arc , between places and transitions

Basic Structure :

The transition is said to fire if input places satisfy an enabled condition. Transition firing will remove one token from each of its input places and put one token into all of its output places. There are two types of input place for the transition namely specified type and conditional type. The former one has single output arc whereas the latter one has multiples. Tokens in the specified type place have only one outgoing destination I,e if the input places holds a token then the transition fires and gives the output places a token. However tokens in conditional type place have more than one outgoing paths that may lead the system to different situations.

There are three types of transitions that are classified based on time. Transition with no time delay are called immediate transitions while those need a certain time delay are called timed transition. The third type is called a stochastic transition. It is used for modeling a process with random time. Owing to variety of logical relations that can be represented with petri nets, it is powerful tool for modeling system. Petri nets an be used not only for simulation, reliability analysis, and failure monitoring, but also for dynamic behavior observation. This greatly helps fault tracing and failure state analysis. Moreover, the use if petri nets can improve the dialogue bet analysis and designer of a system.

Minimum cut setsTo identify the minimum cut sets in a petri net the matrix method is used, as follows

1.Put down the number of the input places in the row if the output place is connected by multi arcs from transition . This accounts for OR models

2.If the output place is connected by one arc from a transition then numbers of the input places should be put down in a column. This accounts for the and models

3. The common entry located in rows is the entry shared by each row

4. Starting from the top event down to the basic event s until all the places are replaced by basic events , the matrix is thus formed, called the basic event matrix, the column vector of the matrix constitute cut sets

5. Remove the super sets from the basics event matrix and the remaining column vector become the minimum cut sets

Minimum cut sets can be derived in an opposite, bottom up , direction , that is from basics places to the top place . Transition with T=0 are called immediate transition . If the petri nets is immediate transition , i.e. the token transfer between places do not take time, then can be absorbed to a simplified form called the equivalent petri net. After absorption, all the remaining place are basic events . The equivalent petri nets exactly constitutes the minimum cut sets, i.e. the input of each transition represents a minimum cut sets

Monte Carlo simulation In a Monte Carlo simulation, a logical

model of the system being analyzed is repeatedly evaluated, each run using different values of the distributed parameters

The selection of parameters values is made randomly but with probabilities governed by the relevant distribute functions

Monte carlo simulation can be used for system reliability and availability modeling , using suitable computer programs. Since Monte carlo simulation involves no complex mathematical analysis, it is an attractive alternative approach.it is relatively easy way to model complex systems , and the input algorithm are easy to understand

One problem in this methods is that its expensive use of compute time

Since the simulation of probabilistic events generates variable results, in effect simulating the variability of real life, it is usually necessary to perform a number of runs in order to obtain estimates of mean and variance of the output parameters of interest such as availability number of repairs arising and facility utilization on the other hand , the effect of variation can be assessed .

Design analysis methods Design analysis methods have been developed to

highlight critical aspects and to focus attention on possible shortfalls

Design analyses are sometimes considered tedious and expensive

In most case the analyses will show that nearly all aspects of the design are satisfactory, and much more effort will have been expended in showing this than in highlighting a few deficiencies

The tedium and expense can be greatly reduced by good planning and preparation and by the use of computerized methods ,.

The main reliability design analysis technique described1.Quality function deployment 2.Reliability prediction3.Load-strength analysis4.Failure modes, effects and critically analysis5.Fault tree analysis 6.Hazard and operability study7.Parts materials and process review8.Others, including human aspects manufacturing, maintenance, etc..

Quality function development QFD is a bad transition of a good reliability

technique for getting the voice of the customer into the design process so the product the customer desires.in particular ,it is applicable to soft issues that are difficult to specify

This method helps to pinpoint what to do, the best way to accomplish the objective the best order for achieving the design objective and staffing asserts to complete the task

It is a major up front effort to learn and understand the customer’s requirement and the approach that will satisfy their objectives

The methodology is used as a team approach to solving problems and satisfying customers , beginning with a listing

Failure Mode and Effect Analysis(FMEA) Failure mode and effect analysis is the study of

potential failures that might occur in any part of a system to determine the probable operation success.

When criticality analysis is added for sophisticated studies the method is known as FMECA.

The basic thrust of the analysis tool is to prevent failures using a simple and cost effective analysis that draws on the collective information of the team to find problem and resolve them before they occur

The analysis is known as a bottom-up (inductive) approach to finding each potential mode of failure that might occur for every component of a system .it also used for determining the probable effect on the system operation of each failure mode and , in turn on probable operational success

FMEA can be performed from different viewpoints such as safety, mission success, repair costs, failure modes, reliability reputation

FMEA is most productive when performed during the design process to eliminate potential failures it can also be performed on existing systems

The analysis can be conducted in the design room or on the shop floor and it is an excellent tool for sharing the experience to make the team aware of details that are known to one person but seldom shared with the team .

Accelerated testing A test method of increasing loads to quickly

produce age to failure data with only a few data points are then scaled to reflect normal loads

The benefits of this testing is to save time and money while quantifying the relationship between stress and performance along with identifying design at low cost

It is used to correlate with real life conditions It is useful method for solving old, nagging

problems within a production process

Accelerated testing shortens the test tie as the tests are conducted at higher stress levels to expediting the failure tie to be days instead of month or years

Challenges faced by designer :1.Long test time to complete life testing of product 2.Constraints on timelines3.Cost as function of time4.Reliability growth

Care has to be taken that the stress or the agent of failure does not results in failure in another failure mode than the one being evaluated

Acceleration rate must be uniform

Types of ALT Qualitative Accelerated Testing HALT HASS Quantitative Accelerated Testing SSALT CSALT CISALT

Highly Accelerated Testing(HALT)

To identify potential failure modes or uncover defects of a product.

Test the component to failure under highly stressed conditions.

Study the failure modes and analyze to the root cause.

Fix the root cause to make the product more robust.

Does not help in predicting the life of the product.

Highly Accelerated Stress Screening (HASS)

Used to monitor the production process.

All products are subjected to the same stresses during HALT but, at a lower level.

It identifies process related defects.

Quantitative Accelerated Testing

Planned/Controlled accelerated testing from which TTF under normal usage conditions can be derived.

Models to be used for a specific agent of failure have been postulated.

Accelerated Factor(AF)=TTFnormal/TTFstress AF is used to derive the normal TTF from

accelerated TTF. Quantitative ALT helps predict the life of the

product.

Improving the process Continuous improvement nearly always leads to

reduced costs , higher producitvity,and higher reliability

Methods that are available for process development are as follows

Simple charts Control charts Multi-vari charts Statistical methods Quality circles Zero defects

Simple charts A variety of simple charting techniques can

be used to help to identify and solve process variability problems.

the pareto chart is often is used as starting point to identify most important problems and most likely causes.

Measles chart is used when problems are distributed over an area

The cause and effect diagram also called fishbone or ishikawa diagram can be used to structure and record problem solving and process improvement efforts. The main problem is indicated on the horizontal line and possible causes are shown as branches which inturn can have subcases

Control chartsWhile using control charts it is monitored continually to find trends that might indicate special causes of variation .trends can be continually run high or low or it can be a cyclic pattern. A continuous high or low trend indicates a need for process or measurement adjustment. A cyclic trend might be caused by temperature fluctuation, process drifts between settings change of materials etc…

Multi-vari charts A multi-vari chart is a graphical method

for identifying the major causes of variation in a process. Multi vari charts can be used for process development and for problem solving, and they can be very effective in reducing the number of variables to include in a statistical experiment.

Multi-vari charts show whether the major causes of variation are spatial, cyclic or temporal. A parameter being monitored is measured in different position s at different points in the production cycle at different times. The results are plotted against two measurement locations, e.g. diameter at each end of the shaft, plotted against batch number from setup. It shows that batch to batch variation is the most significant cause, with a significant pattern of end to end variation(taper).

Statistical MethodsThis method for analysis of variation can be used effectively for variation reduction in production process. They should be used for process improvement, in the same way as for product and process initial design. If a particular process has been the subject of such experiments during development, then the results can be used to guide studies for further experiments. It is also used to identify the major causes of variation, prior to setting up statistical experiments. This way the number of variables to be investigated can be reduced leading to cost savings .

Quality Circles It is the most widely used method world wide. A quality circle team consisting of operators is formed. This manage themselves, select leaders and members, and address the problems. They also suggest improvement if it under their control or they recommend it to the management. The quality circle are taught to use analytical techniques to help identify problems and generate solutions. These are called the seven tools of quality.

The Seven tools of quality are1. Brainstorm, to identify and prioritize

problems2. Data collection3. Data analysis methods, including

measles chart, trend charts and regression analysis

4. Pareto chart5. Histogram6. Cause and Effect diagram7. Statistical Process Control(SPC) chart

Failure Reporting Analysis and Corrective Action System(FRACAS) Failure reporting and analysis is an important

part of the QA function. The system must provide for

1.Reporting of all production test and inspection failures with sufficient detail to enable investigation and corrective action to be taken2.Reporting the results of investigation and action 3. Analysis of failures pattern and trends, and reporting on these 4.Continuos improvement by removal of causes

The data system must be computerized for economy and accuracy modern ATE sometimes includes direct test data recordings and inputting to the central system by networking the data analysis must provide pareto analysis , probability plots and trend analysis for management

Production defect data reporting and analysis must be very quick to be effective. Trends should be analyzed daily , or weekly atmost, particularly for high rates of production , to enable timely corrective action to be taken . The data analysis system also necessary for indicating areas for priority action, using the pareto principle of concentrating action on the few problem area that contribute to the most to the quality cost . For this purpose longer term analysis is necessary

Defective component should not be scrapped immediately, but should be labeled and stored for the period , say one or two months , so that they are available for more detail investigation if necessary.

Production defect data should not be analyzed in isolation by people whose task is primarily the data management. the people involved must participate to ensure that the data interpreted by those involved and that practical results are derived . the quality circle approach provides very effectively for this

Production defect data are important for highlighting possible in service reliability problems. Many in-service failure modes manifest themselves during production inspection and testing. For ex, if a component or process generates failure on the final functional test, and these are connected before delivery , it is possible that the failure mechanism exist in product which pass test and are shipped . Metal surface protection and soldering processes present such risks . Therefore production defects should always be analyzed to determine the likely effects on reliability , external failure cost and all internal production quality cost.

reliability engineering ppt

Education