cost optimized reliability test planning rev 11

Cost-Optimized Reliability Test Planning and Decision-Making Through Bayesian Methods

and Leveraging Prior Knowledge

ASQ Reliability Division Webinar Program

Jun 6th 2013

Charles H. Recchia, MBA, [email protected]

ASQ RD Webinar 2

COST-OPTIMIZED RELIABILITY TEST PLANNING AND DECISION-MAKING THROUGH BAYESIAN METHODS AND LEVERAGING

PRIOR KNOWLEDGE When planning for and interpreting reliability datasets proper application of Bayesian statistics leads to improved decision-making, resource utilization and allows for rigorous treatment of prior knowledge to optimize overall reliability program costs and increase return on investment. In this webinar, we build upon the foundation established in our previous intro-level presentation and provide specific examples of reduced sample sizes enabled by Bayesian methods. We also describe real-world scenarios of improved decision-making during comparative reliability analyses using proper statistical perspectives on relative failure rates between systems.

Charles H Recchia, MBA, PhD has more than twenty-five years of product development, engineering management, and fundamental research experience with a special focus on reliability statistics of complex systems. He earned his doctorate in Condensed Matter Physics from The Ohio State University, and a Master of Business Administration degree from Babson College. Dr. Recchia acquired in-depth reliability engineering expertise at Intel’s Portland Technology Development, MKS Instruments and Saint-Gobain Innovative Materials R&D, has served as visiting professor of physics at Wittenberg University, and is author of numerous peer-reviewed technical papers and patents across multiple fields. Charles provided statistics & advanced lean six sigma consultancy for A123 Systems via the Andover-based Quality Support Group Inc, and has contracted under Coleman Research Group vetting CASIS-ISS US National Lab research proposals. A senior member of ASQ and the American Physical Society, Charles currently works at Raytheon Integrated Defense Systems and serves on the Advisory Committee for the Boston Chapter of the IEEE Reliability Society.

6/6/2013

ASQ RD Webinar 3

References and Further Reading• NIST/SEMATECH e-Handbook of Statistical Methods,

http://www.itl.nist.gov/div898/handbook/, April (2012)• Statistical Methods for Reliability Data, WQ Meeker and LA Escobar

(1998)• Applied Reliability, 2nd edition, PA Tobias and DC Trindade (1995)• Bayesian Reliability, MS Hamada, AG Wilson, CS Reese, and HF Martz,

Springer Series in Statistics (2008)• Bayesian Reliability Analysis, HF Martz and RA Waller (1982)• Methods for Statistical Analysis of Reliability and Life Data, NR Mann, RE

Schafer, and ND Singpurwalla (1974)• Bayes is for the Birds, RA Evans, IEEE Transactions on Reliability R-38, 401

(1989).• “A Compendium of Conjugate Priors,” Daniel Fink (1997)

6/6/2013

ASQ RD Webinar 4

Agenda

• Brief Review of Bayesian Method

• Examples of Reduced Test Sample Sizes

• Comparative Reliability Decision Making

• Question and Answer

6/6/2013

ASQ RD Webinar 5

quick poll

6/6/2013

ASQ RD Webinar 6

Agenda





6/6/2013

ASQ RD Webinar 7

Agenda





6/6/2013

ASQ RD Webinar 8

When reliability follows the exponential TTF model (eg the flat constant failure rate portion of Bathtub Curve):

CLASSICAL FRAMEWORK– The mean time between failures (MTBF) is one fixed unknown value -

there is no “probability” associated with it– Failure data from a test or observation period allows you to make

inferences about the value of the true unknown MTBF ( = 1/l )– No other data are used and no “judgment” - the procedure is objective

and based solely on the test data and the assumed HPP model

BAYESIAN FRAMEWORK

– The MTBF is a random quantity with a probability distribution– Prior to running the test, you already have some idea of what the

MTBF probability distribution looks like based on prior test data or an consensus engineering judgment

– Upon collecting failure data you incorporate the knowledge to refine the distribution of the possible values for l

6/6/2013

ASQ RD Webinar 9

Bayesian Core IdeaWhat you knew before WYKB.

“Prior” New Data

Best possible update of WYKB adjusted by the New Data.

“Posterior”

6/6/2013

𝑔 ( 𝜆 ) {𝑡 𝑖}

𝑔 (𝜆|{𝑡𝑖 })

𝐿 ( {𝑡 𝑖}|𝜆)= ∏uncensored

𝑓 (𝑡 𝑗|𝜆 ) ∏c ensored

(1−𝐹 (𝑡𝑘|𝜆))

“The probability of l beforenew data comes in”

“The likelihood of obtaining given parameter l”

“The probability of parameter l given “

“A new set of failure times“

ASQ RD Webinar 10

Conjugate Prior

• When the functional form of the posterior is the same as that of the prior (as modified by Bayesian likelihood/normalization kernel), that is known as a “conjugate prior”

• Similar concept as eigenfunction. • Conjugate priors are convenient to use due to

tractability and interpretation when possible.

6/6/2013

ASQ RD Webinar 11

Gamma is the conjugate prior for exponential TTF (const failure rate)

b has units of timea is dimensionless

6/6/2013

𝑔 ( 𝜆 ;𝑎 ,𝑏 )= 𝑏𝑎

Γ (𝑎 )𝜆𝑎−1𝑒−𝑏𝜆

Mean lave = a/b Variance s2 = a/b2

In Excel

=GAMMA.DIST(l, a, 1/b, FALSE)

pdfGamma distribution

ASQ RD Webinar 126/6/2013

𝐺 ( 𝜆;𝑎 ,𝑏 )= 1Γ (𝑎)

𝛾 (𝑎 ,𝑏𝜆 )

𝐺 ( 𝜆;𝑎 ,𝑏)=𝐺 (𝑏𝜆 ;𝑎 ,1 )

Where g (x, y) is the lower incomplete gamma function. Note that

In Excel

p =GAMMA.DIST(l, a, 1/b, TRUE)

and its inverse

l = GAMMA.INV(p, a, 1/b)

CDF G(l) is the prob p that the failure rate is less than or equal to l

Gamma distribution

http://en.wikipedia.org/wiki/File:Gamma_distribution_cdf.svg

ASQ RD Webinar 13

Bayesian assumptions for the gamma exponential system model

1. Failure times for the system under investigation can be adequately modeled by the exponential distribution with constant failure rate.

2. The MTBF for the system can be regarded as chosen from a prior distribution model that is an analytic representation of our previous information or judgments about the system's reliability. The form of this prior model is the gamma distribution (the conjugate prior for the exponential model). The prior model is actually defined for l = 1/MTBF.

3. Our prior knowledge is used to choose the gamma parameters a and b for the prior distribution model for l. There are a number of ways to convert prior knowledge to gamma parameters.

6/6/2013

ASQ RD Webinar 14

New data is collected …New information is combined with the gamma prior model to produce a gamma posterior distribution. After a new test is run with T additional system operating hours, and

r new failures, The resultant posterior distribution for failure rate l remains gamma (since conjugate), with new parameters

a' = a + rb' = b + T

6/6/2013

ASQ RD Webinar 15

Reliability estimation with Bayesian gamma prior model

6/6/2013

ASQ RD Webinar 16

Gamma Prior Method 1: Previous Test Data

1. Actual data from previous testing done on the system (or a

system believed to have the same reliability as the one under

investigation) is the most credible prior knowledge, and the

easiest to use. Simply set

a = total number of failures from all the previous data, and

b = total of all the previous test hours.

6/6/2013

ASQ RD Webinar 17

Gamma prior method 2: “50/95”

2. A consensus method for determining a and b that works well is the following: Assemble a group of engineers who know the system and its sub-components well from a reliability viewpoint.

A. Have the group reach agreement on a reasonable MTBF they expect the system to have. They could each pick a number they would be willing to bet even money that the system would either meet or miss, and the average or median of these numbers would be their 50% best guess for the MTBF. Or they could just discuss even-money MTBF candidates until a consensus is reached.

B. Repeat the process again, this time reaching agreement on a low MTBF they expect the system to exceed. A "5%" value that they are "95% confident" the system will exceed (i.e., they would give 19 to 1 odds) is a good choice. Or a "10%" value might be chosen (i.e., they would give 9 to 1 odds the actual MTBF exceeds the low MTBF). Use whichever percentile choice the group prefers.

C. Call the reasonable MTBF MTBF50 and the low MTBF you are 95% confident the system will exceedMTBF05. These two numbers uniquely determine gamma parameters a and b that have percentile values at the right locations

Called the 50/95 method (or the 50/90 method if one uses MTBF10 , etc.)

6/6/2013

ASQ RD Webinar 18

Gamma prior method 3: weak prior a = 1

3. Obtain consensus is on a reasonable expected MTBF, called MTBF50. Next, however, the group decides they want a weak prior that will change rapidly, based on new test data. If the prior parameter "a" is set to 1, the gamma has a standard deviation equal to its mean, which makes it spread out, or "weak".

To set the 50th percentile we must choose b = ln 2 × MTBF50

Note: During planning of Bayesian tests, this weak prior is actually a very friendly prior in terms of saving test time.

6/6/2013

ASQ RD Webinar 19

Special Case: a = 1 (The "Weak" Prior)

When the prior is a weak prior with a = 1, the Bayesian test is always shorter than the classical test. There is a very simple way to calculate the required Bayesian test time when the prior is a weak prior with a = 1. First calculate the classical/frequentist test time. Call this Tc. The Bayesian test time is T = Tc - b. If the b parameter was set equal to (ln 2) × MTBF50(where MTBF50 is the consensus choice for an "even money" MTBF), then T = Tc - (ln 2) × MTBF50

When a weak prior is used, the Bayesian test time is always less than the corresponding classical test time. That is why this prior is also known as a friendly prior.

This prior essentially sets the “order of magnitude” for the MTBF6/6/2013

ASQ RD Webinar 20

RemarksMany variations are possible, based on the above three methods. For example, you might have prior data from sources with various levels of applicability or suitability relative to the system under investigation. Thus, you may decide to "weight" the prior data by 0.5, to "weaken" it. This can be implemented by setting a = 0.5 x the number of fails in the prior data and b = 0.5 times the number of test hours. That spreads out the prior distribution more, and lets it be influenced more quickly by freshly accumulated test data. Most importantly, prior distribution needs to be technically credible, knowledge-based and unbiased.

6/6/2013

ASQ RD Webinar 21

WEIBULLEXAMPLE

6/6/2013

𝑔 (𝜆 ,𝑘|{𝑡 𝑖})

k

lx

TTF CDF

TTF pdf

What if we know the failure rate isn’t constant?

censored data

ASQ RD Webinar 22

Weibull Continued

• If scale q unknown, shape b known

• If scale q known, shape b unknown

6/6/2013

ASQ RD Webinar 23

Agenda





6/6/2013

ASQ RD Webinar 24

Agenda





6/6/2013

“Knowledge as an accelerant”

ASQ RD Webinar 25

Bayesian Test PlanningGamma prior parameters a and b and a stated MTBF = M objective.

Goal: Confirm system has MTBF of at least M at the 100×(1-a ) confidence level. Pick a maximum number of failures, r, allowed during the test.

Compute a test time T such that we can endure r failures and still "pass" the test. The posterior gamma distribution will have (worst case - assuming exactly r failures) new parameters of

a ' = a + r, and b' = b + T

Passing the test means the failure rate λ1- α ,

the upper 100×(1- a) %-tile for the posterior gamma, has to equal the target failure rate 1/M. By definition, this is the inverse CDF G -1(1- a; a', b').The required test time would be:

6/6/2013

𝑇=𝑀𝐺−1 (1−𝛼 ;𝑎+𝑟 ,1 )−𝑏λ1- α = 1/M

1- a

ASQ RD Webinar 26

Example: 50/95 Method Prior A group of engineers, discussing the reliability of a new piece of equipment, decide

to use the 50/95 method to convert their knowledge into a Bayesian gamma prior. Consensus is reached on:

likely MTBF50 value of 600 hrs, and a low MTBF05 value of 250 hrs

Corresponding parameters solved

a = 2.863

b = 1522.46 hrs

These prior parameters “pre-load” the failure rate distribution50% prob of l < 1/600 = 1.67e-3 hrs-1

95% prob of l < 1/250 = 4.00e-3 hrs-16/6/2013

ASQ RD Webinar 27

Example: Bayesian Optimization

System has MTBF requirement of M = 500 hrs at 80 % confidence (a = 0.2).

Test time needed to prove M ≤ 500 hrs with 80% confidence, provided the system suffers no more than two failures (r = 2).

Obtain T = {500 hrs} × (G -1(1-0.2; 2.863+2, 1)) – {1522.46 hrs} = 1756 hrs

If the test then runs for 1756 hrs, with no more than two failures, an MTBF of at least 500 hrs has

been confirmed at 80 % confidence.

The classical (non-Bayesian) test time required would have been (is) 2140 hrs.

The Bayesian test saves about 384 hrs, or an 18 % $avings.

If, instead, a weak prior had been chosen with same 600 hr MTBF50 the required test time would have been

1724 hrs, a savings of roughly 416 hrs, a 19% time $avings vs non-Bayesian.

6/6/2013

ASQ RD Webinar 28

Post-Test Analysis

6/6/2013

ASQ RD Webinar 29

Agenda





6/6/2013

ASQ RD Webinar 30

Agenda





6/6/2013

ASQ RD Webinar 31

EXCEL SPREADSHEET EXAMPLES

6/6/2013

Yes, the Excel spreadsheet will be available along with webinar slides.

“do it live”

ASQ RD Webinar 32

Agenda





6/6/2013

ASQ RD Webinar 33

Agenda





6/6/2013

ASQ RD Webinar 34

References and Further Reading

• NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/,

April (2012)

• Statistical Methods for Reliability Data, WQ Meeker and LA Escobar (1998)

• Applied Reliability, 2nd edition, PA Tobias and DC Trindade (1995)

• Bayesian Reliability, MS Hamada, AG Wilson, CS Reese, and HF Martz, Springer Series in Statistics

(2008)

• Bayesian Reliability Analysis, HF Martz and RA Waller (1982)

• Methods for Statistical Analysis of Reliability and Life Data, NR Mann, RE Schafer, and ND

Singpurwalla (1974)

• Bayes is for the Birds, RA Evans, IEEE Transactions on Reliability R-38, 401 (1989).

• “A Compendium of Conjugate Priors,” Daniel Fink (1997)

6/6/2013

ASQ RD Webinar 35

final quic

k poll

6/6/2013

http://www.google.com/imgres?imgurl=http://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Thomas_Bayes.gif/225px-Thomas_Bayes.gif&imgrefurl=http://en.wikipedia.org/wiki/Thomas_Bayes&h=162&w=152&sz=8&tbnid=-OjW9vSIUSCKtM&tbnh=0&tbnw=0&prev=/search?q=thomas+bayes&tbm=isch&tbo=u&zoom=1&q=thomas+bayes&usg=__VLtM09fWXQfSd96UU1EdxriUPfE=&docid=dUmQ3rjCzO-QuM&sa=X&ei=PNSkUI2sBPPI0AHuyoGICA&ved=0CHsQ1Rc

ASQ RD Webinar 36

Q&A

6/6/2013

[email protected]

cost optimized reliability test planning rev 11

Documents