reliability engineering ppt
TRANSCRIPT
Reliability engineering
Definitions Reliability - The ability of an item to
perform a required function under stated conditions for a stated period of time. It is usually denoted as probability or as a success .
Failure – The termination of ability of an item to perform a required function.
Observed Failure Rate – For a stated period in life of an item, the ratio of the total number of failures in a sample to the cumulative of the time on that sample. The observed failure rate is associated with particular and stated time intervals(or summation of intervals) in the life of the item and under stated conditions.
Observed Mean Time Between Failures(MTBF) – For a stated period in the life of an item, the mean value of the length of time between consecutive failures computed as the ratio of the cumulative observed time to the number failures under stated conditions.
Observed mean time to failure (MTTF)-
For a stated period in the life of an item, the ratio of the cumulative time for a sample to the total number of failure in the sample during the period under stated condition
Name Definition
Guarantee An assurance given by the manufacturer to the vendor that the product will work without failure for a stated period of time
Warranty A written guarantee given to the purchaser of a new appliance, automobile, or other item by the manufacturer or dealer, usually specifying that the manufacturer will make any repairs or replace defective parts free of charge for a stated period of time.
Maintainability The measure of the ability of an item to be retained in or retained in or restored to a specified condition when maintenance is performed by personnel having specified skill levels, using prescribed procedures and resourcesApplies to a major tasks where many repetitions are expected and where considerable time is required
Availability A tool for measuring the percent of time an item or system is in a state of readiness where it is operable and can be committed to use when called upon. Availability ceases because of a downing event that causes the item/to system become unavailable to initiate a mission when called upon
Availability=MTBF/(MTBF+MTTR)
Reliability The ability of an item to perform a required function under stated conditions for a stated period of time. It is usually denoted as probability or as a success .
Guarantee Warranty Maintainability Availability Reliability
An assurance given by the manufacturer to the vendor that the product will work without failure for a stated period of time
A written guarantee given to the purchaser of a new appliance, automobile, or other item by the manufacturer or dealer, usually specifying that the manufacturer will make any repairs or replace defective parts free of charge for a stated period of time.
The measure of the ability of an item to be retained in or retained in or restored to a specified condition when maintenance is performed by personnel having specified skill levels, using prescribed procedures and resourcesApplies to a major tasks where many repetitions are expected and where considerable time is required
A tool for measuring the percent of time an item or system is in a state of readiness where it is operable and can be committed to use when called upon. Availability ceases because of a downing event that causes the item to become unavailable to initiate a mission when called upon
Availability=MTBF/(MTBF+MTTR)
The ability of an item to perform a required function under stated conditions for a stated period of time. It is usually denoted as probability or as a success .
Why engineering items failed? The design might be inherently incapable, the
more complex the design ,more the difficult to overcome the problem
The item might be overstressed in some way Failures can be caused by wear out. Sufficiently
strong at the start of the life and become weaker with age
Failures can be caused by other time dependent mechanism such as battery run down, creep in turbine caused simultaneously by high temperature and tensile stress
Failures can be caused by sneaks . Sneak is the condition in which the system does not work properly even though every part does
Failures can be caused by errors such as incorrect specification, design ,fault assembly or test
There are many other potential causes to failure such as oil leaks noisy ,display flickering etc.
Knowing ,as far as is practicable, the potential causes of failures is fundamental to preventing them
Failures might be caused by variation
What is reliability engineering Manufacturers often suffer high costs of failure
under warranty Reliability is usually concerned with failures in
the time domain. This distinction marks the difference between traditional quality control and reliability engineering
Whether failures occur or not and their times to occurrence can seldom be forecast accurately .reliability is therefore an aspect of engineering uncertainty
Whether an item will work for a particular period is a question which can be answered as a probability.
Ultimately reliability engineering is effective management of engineering
Need for Reliability
Non-Repairable items Reliability is the survival probability over the items
expected life ,or for a period during its life, when only one failure can occur
The instantaneous probability of the first and only failure is called hazard rate
MTTF , the expected life by which a certain percentage might have failed is used here.
The non repairable parts may be individual parts such as bulb, transistor or systems comprised of many parts such as spacecraft, microprocessor
When a part fails in a non repairable system, the system fails, hence the reliability is function of the time to the first part failure
Repairable items Reliability is the probability that the failure will
not occur in the period of interest, when more than one failure can occur .
It can also be expressed as failure rate or the rate of occurrence of failures
Reliability is characterized by MTBF, but only under the particular condition of a constant failure rate
In a repairable system which contains which contains a part type ,the part will contribute by that amount to the system failure rate
Bath tub curve
What: the concept is derived from the human life experience involving infant mortality, chances of failures, plus a wear out period of life since data for births and deaths is accumulated by government agencies. Most equipment lacks the birth/death recording by govt. and most non-human systems can be regenerated to live/die many times before relegation to the scrap heap
Bath Tub Curve
Why: failures rate are different for both people and equipment at different phase of operation and the medicine to be applied to both humans and equipment need to be considered for effectively treating the roots of the problem
Mean Median Mode
The sample mean can be used to estimate the population mean , which is the average of all possible outcomes
It is the measure of the central tendency, which is the mid point of the distribution It is the point at which half the measured values fall to either side
It is the value at which the distribution peaks.
Distribution Plots
Parametric Analysis Parametric Analysis is fitting the data to a known
distribution and estimating the parameters of the distribution.
Parametric Analysis is done by using two most commonly used methods :
-Regression Analysis -Most Likelihood Method Having got a fit, a statistic is calculated to estimate
the goodness of the fit after which a confidence interval of the parameters can be found.
Regression Analysis Most commonly used continuous distribution
are - Weibull Distribution - Normal Distribution - Lognormal Distribution - Exponential Distribution First we linearize the basic CDF by making the
required transformation. From that we find parameters of the distribution.
Linearized Formulae for Weibull Distribution Xi=ln(ti) Yi=ln ln[1/( 1-F(ti) )] where F(ti) is Cumulative Failure Function F(ti)= (i-0.3)/(n+4) (For ith failure out of n components) β= Slope η = exp(-abs[intercept]/ β)
A straight line is fitted using the X and Y data points by minimizing the sum of squares of the distance of the data points from the fitted line. The distance can be in vertical or horizontal direction.
There is a correlation coefficient, referred to as r whose values varies from -1 to 1. The more the value of r^2 reaches 1 the more linear is the relation between X and Y.
To see the complete solution click here
Most Likelihood Method(MLE)
It also helps in estimating the parameters of distribution.
It does that by defining a likelihood function which is function of parameters of the distribution.
The Likelihood function is maximized to find the parameters of the distribution.
Life Testing Data Types Used for MLE Estimates
Life Testing
TYPE ITime
Terminated
With Replacement
Without Replacement
TYPE IIFailure
Terminated
With Replacement
Without Replacement
MLE Weibull Parameter Estimation
/1
1
1
1
1
)(1
0ln11
)(
ln)(ln)(
r
isi
r
iir
isi
r
issii
trntr
tr
trnt
ttrnttg
ts = 1 For Complete Data =Test time For TYPE I Data = tr For TYPE II Data
ti is time taken for ith failure r is the number of failures n is total number of components Find β for g(β)=0 Substitute that in second equation
and find η
For Completely Solved Solution Click Here
Goodness Of Fit (GOF) r^2 value in the case of Regression
analysis is used to find goodness of fit. For MLE we use the following GOF
statistic.• Chi-Square Method
• Kolmogorov-Smirnov Test Often data would fit many distribution.
Hence we have to find GOF so as to find the perfect distribution.
Chi Square Test Applicable to all distributions having
large sample size.
Applied to both discrete and continuous data.
The probabilities are based on null hypothesis
Formula Used Χ^2 = where k=number of classes Oi=Observed number of failures in ith class Ei=Expected number of failures in ith class n= Sample Size
Degrees of Freedom=k-1-number of estimated parameters
Example
There are 35 failure times listed below. Check if the distribution follows exponential distribution.
GIVEN DATA
1476 300 98 221 157
182 499 552 1563 36
246 442 20 796 31
47 438 400 279 247
210 284 553 767 1297
214 428 597 2025 185
467 401 210 289 1024
Group the result in specific bounds
Upper Bound Number of failure times observed in
that bound
350 18
750 10
2026 7
Cumulative Failure Function, F(ti)=1-exp(-λti)
for exponential distribution. Thus expected number of failures in the bound is given by E(ti)=number of components*expected failure(F(ti))Let λ=0.00206 E1=35*(1-exp(-350*0.00206))=17.98 E2=35*(1-exp(-350*0.00206)-P1)=9.55 E3=35*(1-P2-P1)=7.47
From the formula, we find the value of χ^2
Degree of Freedoms, k=3-1-1=1
From the Statistic table for Chi-Square we get, for k=1 and χ^2=0.0496, α is between 10% to 20% (α should be less than 90%). Hence, the Null Hypothesis is accepted. Thus, we can say that the distribution is Exponential.
For completely solved solution click here
Kolmogorov-Smirnov Test It is also used to find the GOF but that it
can be used even to small sample size.
Formulae Used Sn(tn)=0 For -∞<t1 =i/n For ti<t< ∞;i=1,2….n-1 =1 For tn<t<∞
K – S = max(|F(ti)-Sn(ti)|,|F(ti)-Sn(ti-1)|) Where F(ti) is Cumulative failure of the distribution ti is the Time taken for ith Failure n is sample size
Example The following 14 observations are on the
failure time of a component in hours. Test the hypothesis that the failure time is normal.
For normal distribution,z= (x-μ)/σ where μ is the mean σ is the standard deviation
Cumulative Failure Function, F(ti)=(1/σ √2)℮^(-0.5)[(x- μ)/σ]^2
i TTF i TTF
1 61.6 8 72.7
2 63.4 9 73
3 65.1 10 75.3
4 65.5 11 77.1
5 70 12 78.4
6 72.3 13 83.2
7 72.5 14 83.5
GIVEN DATA
For Completely Solved Solution Click BelowK-S Test.xlsx
Reliability Block Diagram Systems are composed of components RBD is a method of evaluating the
reliability of the system by the establishing following relationship
Series Parallel
Combination of both These structure helps in understanding
logic relationship
Series configuration
Failure of any one component in the block will lead to the failure of the entire system
Rs - system reliability E1 - event where component 1 does not fail E2 - event where component 2 does not fail R1 - reliability of component 1 R2 – reliability of component 2
1 2 n
FormulaRs = P(E1 E2 ) = P(E1) P(E2) = R1 (R2 )Therefore the system reliability must be greater than the individual component reliabilityi.e. All component's must have high reliability in this configuration
Parallel configuration In a parallel system all elements must
fail for the system to fail
1
2
n
formulaRS=1-(1-R1)(1-R2)GeneralizingRs=1- [1- Ri (t) ]
Combination of parallel and series
Example If R1=R2 =0.90,R3=R6=0.98,R4=R5=0.99
considering as constant failure rateSolution:Ra=1-(0.10)^2Rb=[1-(0.10)^2](0.98) =0.9702Rc=(0.99)^2 =0.9801& Rs=[1-(1-0.9702)(1-0.98)](0.98) =0.9794
FAULT TREE ANALYSIS An undesired event is defined The event is resolved into its immediate
causes This resolution of events continues until
basic causes are identified A logical diagram called a fault tree is
constructed showing the logical event relationships
ELEMENTS FTA is a deductive analysis approach for
resolving an undesired event into its causes FTA is a backward looking analysis, looking
backward at the causes of a given event Specific stepwise logic is used in the process Specific logic symbols are used to to
illustrate the event relationships A logic diagram is constructed showing the
event relationships.
USES FTA is used to resolve the causes of system
failure FTA is used to quantify system failure probability FTA is used to evaluate potential upgrades to a
system FTA is used to optimize resources in assuring
system safety FTA is used to resolve causes of an incident FTA is used to model system failures in risk
assessments
FOUR STEPS1. Define the undesired event to be analyzed (the focus of the FTA)2. Define the boundary of the system (the scope of the FTA)3. Define the basic causal events to be considered (the resolution of the FTA)4. Define the initial state of the system
BASIC EVENTS
BASIC GATES
Example
Specifications Undesired top event: Motor does not
start when switch is closed Boundary of the FT: The circuit
containing the motor, battery, and switch Resolution of the FT: The basic
components in the circuit excluding the wiring
Initial State of System: Switch open, normal operating conditions
Fault tree
The Top Event of the Fault Tree The top event should describe WHAT the
event is and WHEN it happens The top event is the specific event to be
resolved into its basic causes EX: 1. Fuel Supply System Fails to Shutoff after the fueling phase 2. Launch Vehicle Fails to Ignite at Launch
OR gate The OR Gate represents the logical union of the
inputs: the output occurs if any of the inputs occur
The OR gate is used when an event is resolved into more specific causes or scenarios
The OR gate is used when a component failure is resolved into an inherent failure or a command failure
The OR gate is used when an event is described in terms of equivalent, more specific events
AND gate The AND Gate represents the logical
intersection of the inputs, the output occurs if all of the inputs occur
The OR gate is used when an event is resolved into combinations of events that need to occur
The AND gate is used when a redundant system is resolved into multiple subsystems that need to fail
The AND gate is used when a system failure is resolved into conditions and events needed to occur
Developing FTA1.Define the top event as a rectangle 2.Determine the immediate necessary and sufficient events which result in the top event 3.Draw the appropriate gate to describe the logic for the intermediate events resulting in the top event 4. Treat each intermediate event as an intermediate level top event 5. Determine the immediate, necessary and sufficient causes for each intermediate event 6. Determine the appropriate gate and continue the process
Key attributes Top Event-What specific event is being
analyzed? Boundary-What is inside and outside the
analysis? Resolution-What are the primary causes
to be resolved to? Initial State-What is assumed for the
initial conditions and states?
FAULT VS FAILURE•The intermediate events in a fault tree are called faults •The basic events, or primary events , are called failures if they represent failures of components •It is important is to clearly define each event as a fault or failure so it can be further resolved or be identified as a basic cause*Write the statements that are entered in the event boxes as faults; state precisely what the fault is and the conditions under which it occurs. Do not mix successes with faults*
Petri nets A petri nets is general purpose graphical and
mathematical tool describing relations existing between conditions and events. The basic symbol of petri nets include
: place , denotes events : immediate transition , denotes event transfer with no delay : timed transition , denotes event transfer the period of tie delay : arc, between places and transitions : token, contained in places , denotes the data : inhibitor arc , between places and transitions
Basic Structure :
The transition is said to fire if input places satisfy an enabled condition. Transition firing will remove one token from each of its input places and put one token into all of its output places. There are two types of input place for the transition namely specified type and conditional type. The former one has single output arc whereas the latter one has multiples. Tokens in the specified type place have only one outgoing destination I,e if the input places holds a token then the transition fires and gives the output places a token. However tokens in conditional type place have more than one outgoing paths that may lead the system to different situations.
There are three types of transitions that are classified based on time. Transition with no time delay are called immediate transitions while those need a certain time delay are called timed transition. The third type is called a stochastic transition. It is used for modeling a process with random time. Owing to variety of logical relations that can be represented with petri nets, it is powerful tool for modeling system. Petri nets an be used not only for simulation, reliability analysis, and failure monitoring, but also for dynamic behavior observation. This greatly helps fault tracing and failure state analysis. Moreover, the use if petri nets can improve the dialogue bet analysis and designer of a system.
Minimum cut setsTo identify the minimum cut sets in a petri net the matrix method is used, as follows
1.Put down the number of the input places in the row if the output place is connected by multi arcs from transition . This accounts for OR models
2.If the output place is connected by one arc from a transition then numbers of the input places should be put down in a column. This accounts for the and models
3. The common entry located in rows is the entry shared by each row
4. Starting from the top event down to the basic event s until all the places are replaced by basic events , the matrix is thus formed, called the basic event matrix, the column vector of the matrix constitute cut sets
5. Remove the super sets from the basics event matrix and the remaining column vector become the minimum cut sets
Minimum cut sets can be derived in an opposite, bottom up , direction , that is from basics places to the top place . Transition with T=0 are called immediate transition . If the petri nets is immediate transition , i.e. the token transfer between places do not take time, then can be absorbed to a simplified form called the equivalent petri net. After absorption, all the remaining place are basic events . The equivalent petri nets exactly constitutes the minimum cut sets, i.e. the input of each transition represents a minimum cut sets
Monte Carlo simulation In a Monte Carlo simulation, a logical
model of the system being analyzed is repeatedly evaluated, each run using different values of the distributed parameters
The selection of parameters values is made randomly but with probabilities governed by the relevant distribute functions
Monte carlo simulation can be used for system reliability and availability modeling , using suitable computer programs. Since Monte carlo simulation involves no complex mathematical analysis, it is an attractive alternative approach.it is relatively easy way to model complex systems , and the input algorithm are easy to understand
One problem in this methods is that its expensive use of compute time
Since the simulation of probabilistic events generates variable results, in effect simulating the variability of real life, it is usually necessary to perform a number of runs in order to obtain estimates of mean and variance of the output parameters of interest such as availability number of repairs arising and facility utilization on the other hand , the effect of variation can be assessed .
Design analysis methods Design analysis methods have been developed to
highlight critical aspects and to focus attention on possible shortfalls
Design analyses are sometimes considered tedious and expensive
In most case the analyses will show that nearly all aspects of the design are satisfactory, and much more effort will have been expended in showing this than in highlighting a few deficiencies
The tedium and expense can be greatly reduced by good planning and preparation and by the use of computerized methods ,.
The main reliability design analysis technique described1.Quality function deployment 2.Reliability prediction3.Load-strength analysis4.Failure modes, effects and critically analysis5.Fault tree analysis 6.Hazard and operability study7.Parts materials and process review8.Others, including human aspects manufacturing, maintenance, etc..
Quality function development QFD is a bad transition of a good reliability
technique for getting the voice of the customer into the design process so the product the customer desires.in particular ,it is applicable to soft issues that are difficult to specify
This method helps to pinpoint what to do, the best way to accomplish the objective the best order for achieving the design objective and staffing asserts to complete the task
It is a major up front effort to learn and understand the customer’s requirement and the approach that will satisfy their objectives
The methodology is used as a team approach to solving problems and satisfying customers , beginning with a listing
Failure Mode and Effect Analysis(FMEA) Failure mode and effect analysis is the study of
potential failures that might occur in any part of a system to determine the probable operation success.
When criticality analysis is added for sophisticated studies the method is known as FMECA.
The basic thrust of the analysis tool is to prevent failures using a simple and cost effective analysis that draws on the collective information of the team to find problem and resolve them before they occur
The analysis is known as a bottom-up (inductive) approach to finding each potential mode of failure that might occur for every component of a system .it also used for determining the probable effect on the system operation of each failure mode and , in turn on probable operational success
FMEA can be performed from different viewpoints such as safety, mission success, repair costs, failure modes, reliability reputation
FMEA is most productive when performed during the design process to eliminate potential failures it can also be performed on existing systems
The analysis can be conducted in the design room or on the shop floor and it is an excellent tool for sharing the experience to make the team aware of details that are known to one person but seldom shared with the team .
Accelerated testing A test method of increasing loads to quickly
produce age to failure data with only a few data points are then scaled to reflect normal loads
The benefits of this testing is to save time and money while quantifying the relationship between stress and performance along with identifying design at low cost
It is used to correlate with real life conditions It is useful method for solving old, nagging
problems within a production process
Accelerated testing shortens the test tie as the tests are conducted at higher stress levels to expediting the failure tie to be days instead of month or years
Challenges faced by designer :1.Long test time to complete life testing of product 2.Constraints on timelines3.Cost as function of time4.Reliability growth
Care has to be taken that the stress or the agent of failure does not results in failure in another failure mode than the one being evaluated
Acceleration rate must be uniform
Types of ALT Qualitative Accelerated Testing HALT HASS Quantitative Accelerated Testing SSALT CSALT CISALT
Highly Accelerated Testing(HALT)
To identify potential failure modes or uncover defects of a product.
Test the component to failure under highly stressed conditions.
Study the failure modes and analyze to the root cause.
Fix the root cause to make the product more robust.
Does not help in predicting the life of the product.
Highly Accelerated Stress Screening (HASS)
Used to monitor the production process.
All products are subjected to the same stresses during HALT but, at a lower level.
It identifies process related defects.
Quantitative Accelerated Testing
Planned/Controlled accelerated testing from which TTF under normal usage conditions can be derived.
Models to be used for a specific agent of failure have been postulated.
Accelerated Factor(AF)=TTFnormal/TTFstress AF is used to derive the normal TTF from
accelerated TTF. Quantitative ALT helps predict the life of the
product.
Improving the process Continuous improvement nearly always leads to
reduced costs , higher producitvity,and higher reliability
Methods that are available for process development are as follows
Simple charts Control charts Multi-vari charts Statistical methods Quality circles Zero defects
Simple charts A variety of simple charting techniques can
be used to help to identify and solve process variability problems.
the pareto chart is often is used as starting point to identify most important problems and most likely causes.
Measles chart is used when problems are distributed over an area
The cause and effect diagram also called fishbone or ishikawa diagram can be used to structure and record problem solving and process improvement efforts. The main problem is indicated on the horizontal line and possible causes are shown as branches which inturn can have subcases
Control chartsWhile using control charts it is monitored continually to find trends that might indicate special causes of variation .trends can be continually run high or low or it can be a cyclic pattern. A continuous high or low trend indicates a need for process or measurement adjustment. A cyclic trend might be caused by temperature fluctuation, process drifts between settings change of materials etc…
Multi-vari charts A multi-vari chart is a graphical method
for identifying the major causes of variation in a process. Multi vari charts can be used for process development and for problem solving, and they can be very effective in reducing the number of variables to include in a statistical experiment.
Multi-vari charts show whether the major causes of variation are spatial, cyclic or temporal. A parameter being monitored is measured in different position s at different points in the production cycle at different times. The results are plotted against two measurement locations, e.g. diameter at each end of the shaft, plotted against batch number from setup. It shows that batch to batch variation is the most significant cause, with a significant pattern of end to end variation(taper).
Statistical MethodsThis method for analysis of variation can be used effectively for variation reduction in production process. They should be used for process improvement, in the same way as for product and process initial design. If a particular process has been the subject of such experiments during development, then the results can be used to guide studies for further experiments. It is also used to identify the major causes of variation, prior to setting up statistical experiments. This way the number of variables to be investigated can be reduced leading to cost savings .
Quality Circles It is the most widely used method world wide. A quality circle team consisting of operators is formed. This manage themselves, select leaders and members, and address the problems. They also suggest improvement if it under their control or they recommend it to the management. The quality circle are taught to use analytical techniques to help identify problems and generate solutions. These are called the seven tools of quality.
The Seven tools of quality are1. Brainstorm, to identify and prioritize
problems2. Data collection3. Data analysis methods, including
measles chart, trend charts and regression analysis
4. Pareto chart5. Histogram6. Cause and Effect diagram7. Statistical Process Control(SPC) chart
Failure Reporting Analysis and Corrective Action System(FRACAS) Failure reporting and analysis is an important
part of the QA function. The system must provide for
1.Reporting of all production test and inspection failures with sufficient detail to enable investigation and corrective action to be taken2.Reporting the results of investigation and action 3. Analysis of failures pattern and trends, and reporting on these 4.Continuos improvement by removal of causes
The data system must be computerized for economy and accuracy modern ATE sometimes includes direct test data recordings and inputting to the central system by networking the data analysis must provide pareto analysis , probability plots and trend analysis for management
Production defect data reporting and analysis must be very quick to be effective. Trends should be analyzed daily , or weekly atmost, particularly for high rates of production , to enable timely corrective action to be taken . The data analysis system also necessary for indicating areas for priority action, using the pareto principle of concentrating action on the few problem area that contribute to the most to the quality cost . For this purpose longer term analysis is necessary
Defective component should not be scrapped immediately, but should be labeled and stored for the period , say one or two months , so that they are available for more detail investigation if necessary.
Production defect data should not be analyzed in isolation by people whose task is primarily the data management. the people involved must participate to ensure that the data interpreted by those involved and that practical results are derived . the quality circle approach provides very effectively for this
Production defect data are important for highlighting possible in service reliability problems. Many in-service failure modes manifest themselves during production inspection and testing. For ex, if a component or process generates failure on the final functional test, and these are connected before delivery , it is possible that the failure mechanism exist in product which pass test and are shipped . Metal surface protection and soldering processes present such risks . Therefore production defects should always be analyzed to determine the likely effects on reliability , external failure cost and all internal production quality cost.