developments in probability of de tec

7
JANUARY 2015 • MATERIALS EVALUATION 55 T he purpose of this paper is to review recent developments in probability of detection (POD) modeling and demonstrate how simulation studies can be used to benchmark the effect of sample size on confidence bound calculations by means of evaluating probability of coverage. Review of Recent Advances The historical development of POD is well documented in previous papers and will not be discussed in this paper (Berens, 1989; Knopp et al., 2012; Olin and Meeker, 1996). In the last decade, the major development was the use of the likelihood ratio technique for confidence bound calculations pertaining to hit/miss data (Annis and Knopp, 2007; Harding and Hugo, 2003; Spencer, 2007). This was incorporated into the guidance for POD studies provided in MIL-HDBK-1823A (DOD, 2010). The other areas where significant advances have been made include the following: l Exposition on the distinction between uncertainty and variability in nonde- structive testing (NDT) (Li et al., 2012). l Use of markov chain monte carlo (MCMC) simulation for confidence bounds (Knopp and Zeng, 2013). Developments in Probability of Detection Modeling and Simulation Studies by Jeremy S. Knopp, Frank Ciarallo, and Ramana V. Grandhi POD models w x ME FEATURE Photo credit: William Meeker, Iowa State University From Materials Evaluation, Vol. 73, No. 1, pp: 55–61. Copyright © 2015 The American Society for Nondestructive Testing, Inc.

Upload: pddeluca

Post on 31-Jan-2016

5 views

Category:

Documents


0 download

DESCRIPTION

DevelopmentsInProbabilityOfDetec.pdf

TRANSCRIPT

Page 1: Developments in Probability of de Tec

J A N U A R Y 2 0 1 5 • M A T E R I A L S E V A L U A T I O N 55

T he purpose of this paper is to review recent developments in probability of detection (POD) modeling and demonstrate howsimulation studies can be used to benchmark the effect of sample size on confidence bound calculations by means of

evaluating probability of coverage.

Review of Recent AdvancesThe historical development of POD is well documented in previous papersand will not be discussed in this paper (Berens, 1989; Knopp et al., 2012;Olin and Meeker, 1996). In the last decade, the major development was theuse of the likelihood ratio technique for confidence bound calculationspertaining to hit/miss data (Annis and Knopp, 2007; Harding and Hugo,2003; Spencer, 2007). This was incorporated into the guidance for PODstudies provided in MIL-HDBK-1823A (DOD, 2010).

The other areas where significant advances have been made include thefollowing:l Exposition on the distinction between uncertainty and variability in nonde-structive testing (NDT) (Li et al., 2012).

l Use of markov chain monte carlo (MCMC) simulation for confidencebounds (Knopp and Zeng, 2013).

Developments inProbability ofDetection Modelingand SimulationStudiesby Jeremy S. Knopp, Frank Ciarallo, andRamana V. Grandhi

PODmodels

wx ME FEATURE

Photo credit: William Meeker,Iowa State University

From Materials Evaluation, Vol. 73, No. 1, pp: 55–61.Copyright © 2015 The American Society for Nondestructive Testing, Inc.

Page 2: Developments in Probability of de Tec

56 M A T E R I A L S E V A L U A T I O N • J A N U A R Y 2 0 1 5

l Three- and four-parameter models (Knopp andZeng, 2013; Moore and Spencer, 1998; Spencer,2014).

l Bootstrapping for confidence bound calculations(Knopp et al., 2012).

l Nonparametric techniques for POD modeling(Spencer, 2011).

l Box-cox transformations to mitigate violations ofhomoscedasticity (Knopp et al., 2012).

l Sample size determination (Annis, 2014; Safizadehet al., 2004).

l Bayesian design of experiments (Koh and Meeker,2014).

Uncertainty and VariabilityIt may be argued that the first item on the list is not anadvance; however, it represents a very clear expositionand reminder of what a POD curve and associatedconfidence bounds actually provide (Li et al., 2012).The authors point out that that the traditional way ofperforming a POD study determines the mean POD,

which averages out variability from the inspectionprocess. A common error in interpreting a PODanalysis is to think of the confidence bounds aspertaining to variability in the inspection process. Theconfidence bounds only pertain to the error due tosampling for a single experimental run. Therefore, it isa serious error to assume that 95% of a90 values infuture experiments will be within the confidencebounds computed from a single experiment. If one isinterested in where 95% of a90 values in future experi-ments will lie, the appropriate interval is a toleranceinterval, discussed elsewhere (Li et al., 2015). Thetechnical details also appear in a tutorial in the context

of linear regression (De Gryze et al., 2007). The conceptof probability of coverage discussed in this paper looksat this issue from another perspective.

Three- and Four-parameter Models and MarkovChain Monte Carlo

Three- and four-parameter POD models have beenproposed to address limitations of the conventionaltwo-parameter models (Moore and Spencer, 1998;Spencer, 2014). Figure 1 shows the conventional two-parameter model. The quantities of interest are a50,a90, and a90/95. The mean 50% POD is a50, the mean90% POD is a90, and the upper 95% confidencebound on a90 is known as a90/95. The two-parametermodel forces the POD curve to zero as the disconti-nuity size approaches zero, and to one as the disconti-nuity size approaches infinity. There are many datasets that do not support POD of one for any disconti-nuity size, and so a modified model that includeslower and upper asymptotes was developed toaddress this issue (Moore and Spencer, 1998;Spencer, 2014). The conventional model can bemodified by adding additional parameters. ConsiderEquations 1 and 2 that show a four-parameter modelfor the logit and probit links, respectively.

(1)

(2)

( )( )( )= α + β − α ×

+ + +

pb b a

b b a

exp log1 exp log

(logit)ii

i

0 1

0 1

( )( )= α + β − α × Φ + p b b alog (probit)i i0 1

MEFEATURE wx developments in probability of detection modeling

Figure 1. Conventional two-parameter probability of detection (POD) model.

0 2 4 6 8 10 12 14 16

a50

a90 a

90/95

POD (a)Upper confidence bound

a (mm)

POD

1.0

0.8

0.6

0.4

0.2

0.0

confidence bounds only pertain tothe error due to sampling for a singleexperimental run

Page 3: Developments in Probability of de Tec

J A N U A R Y 2 0 1 5 • M A T E R I A L S E V A L U A T I O N 57

The α parameter represents the fact that there is afinite probability of detecting cracks well below thesize intended and can be thought of as a false callrate. A recent work points out that even though PODwhen no discontinuity is present might (and probablydoes) equal something other than zero, it is not PODper se, but leads to a function that models an inde-pendent false call process on detections (Spencer,2014). There is also a finite probability that very largecracks will not be detected, and an upper asymptotecan also be estimated for this. The β parameter repre-sents this for large discontinuity sizes. In Figure 2a,the POD curve approaches one for very large disconti-nuities but has an α parameter that needs to beestimated as the discontinuity size approaches zero.Figure 2b shows the β parameter that needs to beestimated to represent detection capability at very

large discontinuity sizes. Figure 2c shows a four-parameter model that requires both α and β to becalculated.

Depending on whether α and β are included in themodel, there are four candidate models:l Two-parameter model that does not include α or β.l Three-parameter model with a lower asymptote, α.l Three-parameter model with an upper asymptote, β.l Four-parameter model that includes both α and β.

Historically, the logit and probit links have beenfound to fit POD data well, so there are a total of eightpossible models. The question of which model isappropriate for a given data set can be answered byusing the bayes factor approach described in priorwork (Kass and Raftery, 1995; Knopp and Zeng, 2013).Recently, an alternative technique of computing confi-dence intervals for POD models that include lower andupper asymptotes via MCMC was introduced (Knoppand Zeng, 2013). MCMC techniques are very similar tomonte carlo techniques with one important distinc-tion: the samples in monte carlo are statistically inde-pendent, which means an individual sample does notdepend on a previous sample. In MCMC, the samplesare correlated with each other. MCMC has proven tobe an effective way to compute multidimensionalintegrals that occur in bayesian calculations. SinceMCMC is the computational engine that enablesbayesian analysis, computing confidence bounds withnon-informative priors is a graceful first step to intro-ducing bayesian techniques, which is necessary formodel-assisted POD.

BootstrappingCurrent guidance for POD studies assumes that alinear relationship exist between the explanatoryvariable, such as discontinuity size, which iscommonly designated with an “a” in POD literature,and the measurement response “â”. Typically, a loga-rithmic transform will remedy cases where the linearrelationship is not established, but this is not alwaysthe case. Models with additional complexity beyond alinear model are sometimes necessary for properanalysis of â data. This was the case in the analysis ofdata from an inspection of subsurface cracks aroundfastener sites using eddy current (Knopp et al., 2012).The difficulty in these cases is that the procedures forconfidence bound calculations are not developed;however, a flexible approach called bootstrapping wasdemonstrated. Bootstrapping is essentially samplingwith replacement, and it is a very easy technique toimplement for more complicated models and is veryuseful for model-assisted POD.

Figure 2. Probability of detection (POD) model options: (a) three-parameter with lower bound; (b) three-parameter with upper bound; and (c) four-parameter with lower and upper bound.

1

POD

0a

α

1

POD

0a

α

1

POD

0a

β

β

(a)

(b)

(c)

Page 4: Developments in Probability of de Tec

58 M A T E R I A L S E V A L U A T I O N • J A N U A R Y 2 0 1 5

Nonparametric Models

The POD model described in MIL-HDBK-1823Aassumes an S-shaped curve described by two parame-ters (DOD, 2010). The three-parameter and four-parameter models discussed earlier are modifiedversions of that model. An entirely different idea is tonot assume any particular model form, which isreferred to in the literature as a nonparametric model.A nonparametric model was proposed for POD withthe only assumption being that the POD function ismonotonically increasing with respect to discontinuitysize (Spencer, 2011). This model is useful for manyreasons. First it can be used as a screening model bycomparing the form of the nonparametric model withthe selected parametric model. For example, if thenonparametric model closely follows a three-parameter model with an upper asymptote, chancesare that the three-parameter model is the best fit. It isgenerally useful to see what type of model form thedata dictates before forcing a parametric model on it.

Box-cox TransformationIt is always advantageous to use the measuredresponse data for POD evaluation since there is moreinformation contained in that form rather thanhit/miss; however, real inspection data often violatecore assumptions required to use a POD model fit.Another development is the use of the box-cox trans-formation to mitigate violations of homoscedasticityfor â data analysis. Homoscedasticity means that thescatter in the observations is constant for the disconti-nuity size range. For cases where there is a relation-ship between the mean response and the variance,the box-cox transformation is used to stabilize thevariance. This technique assumes that the relationshipbetween the error variance, s2i , and mean response,μi, can be described with a power transformation on âin the form of Equation 3. The new regression modelin Equation 4 includes the additional λ parameter,which also needs to be estimated.

(3)

(4)

The technique described in an outside work wasfollowed exactly (Kutner et al., 2004). A numericalsearch procedure was set up to estimate λ. The âobservations were first standardized so that the orderof magnitude error sum of squares was not dependenton the value of λ.

The standardized observations were:

(5)

(6)

where n is the total number of observations, c = (âi)1/n, which happens to be the geometric

mean of the observations.

Once these standardized observations areobtained, they are then regressed on a, which in thiscase is crack length, and then the sum of squareserror (SSE) is obtained. The optimization problem isformulated such that the objective is to minimize SSEwith l as a single parameter to be adjusted. Anexample of how the box-cox transform is used waspresented in the context of an eddy current inspectionof cracks around fastener sites in an aircraft structure(Knopp et al., 2012).

Sample SizeOne of the more common questions asked about PODstudies is how many samples will be required. Boththe sample size and distribution of the samples willaffect the POD evaluation. In MIL-HDBK-1823A, it isrecommended that there be at least 40 samples whenthe â signal response data is used and 60 samples forhit/miss (DOD, 2010). The question of the range ofdiscontinuity sizes and the distribution of disconti-nuity sizes is not discussed in MIL-HDBK-1823A, andhas not been examined extensively in the literatureexcept in a few cases (Berens and Hovey, 1985;Safizadeh et al., 2004). Recently, this question forhit/miss data was investigated via simulation andlooked at discontinuity size distribution and theeffects of moving the center of the sample distributionrelative to the true a50 value (Annis, 2014). The recom-mendations from this study include using a uniformdistribution of discontinuity sizes and that the rangeshould be from 3 to 97% POD. The number ofspecimens agrees with MIL-HDBK-1823A in that aminimum of 60 specimens should be used.

Bayesian Design of ExperimentsA bayesian approach to planning hit/miss experimentshas also been presented (Koh and Meeker, 2014).This allows engineers to use any prior information thatmay be known about a POD curve to assist indesigning the experiment. The conclusion of this workwas that optimal test plans developed purely frombayesian techniques may not be practical, but they

′ = λa aˆ ˆ

= β + β + ελa aˆ i i i0 1

( )=λ

− λ ≠λ−λg

ca

1 ˆ 1 , 0i i1

( )= − λ =λg c aln ˆ 1 , 0i i

MEFEATURE wx developments in probability of detection modeling

Page 5: Developments in Probability of de Tec

J A N U A R Y 2 0 1 5 • M A T E R I A L S E V A L U A T I O N 59

can be used to develop a compromise between theoptimal test plan and a uniform distribution of discon-tinuity sizes. Another conclusion was that the recom-mendation of 60 observations for hit/miss analysisperforms well for estimating a50, and may be slightlysub-optimal for estimating a90, but the uniform distri-bution recommendations for hit/miss experiments stillperforms quite well.

Simulation StudiesGoing forward, the authors recommend simulationstudies to provide the NDT practitioner with a connec-tion between the intuition gained with inspectionexperience and the statistical techniques used toquantify the capability of an inspection. For example,simulation studies can be used to benchmark theeffect of sample size on confidence bound calcula-tions by means of evaluating probability of coverage.Probability of coverage is defined as the probabilitythat the interval contains the quantity of interest. Inthis work, covering a90 with a 95% upper confidencebound (that is, a90/95) is of particular interest, so theprobability that a90/95 is greater than the true value ofa90 as defined in Equation 7 is what is meant by prob-ability of coverage in this paper. The objective of simu-lation studies is to show how often (in terms ofpercent) a confidence interval contains the trueparameter of interest.

(7)

A simple case with a model that includes bothadditive and multiplicative noise is proposed as anapproach for how to conduct a simulation study.

Simulation Study with Additive and MultiplicativeNoise

This simulation study uses a noise model that includesboth additive and multiplicative noise as represented inEquation 8, where â is the signal response and a is theindication size. The additive noise component is desig-nated by εadd, and the multiplicative is designated byεmult. This additive and multiplicative noise modelresembles realistic inspection data, and is used for thepurpose of generating a synthetic data set useful forsimulations purposes only.

(8)

The linear model parameters for data used in priorwork was used as the basis for this study (Knopp etal., 2013). This model and associated parameters wasused to create a data set with 100 000 observations

to resemble a population from which samples were drawn as shown in Figure 3. In this case, β0 = 0.13546, β1 = 0.043, εmult = 0.316, and εadd = 0.0316. The proportion of observations abovethe detection threshold of 0.195 was determined inintervals of 1000 observations and the “true” POD curveis plotted in Figure 4. If 100 000 simulated observationsis designated the “population” for this inspection, thenthe interval of observations with 90% proportion above0.195 can be considered the true a90 value. It was deter-mined based on this technique that the “true” a90 is2.907 mm (0.114 in.). The coverage probability of thisvalue is investigated via simulation.

{ }>P a atrue 90/95 90

( )= β + β + ε + εa aˆ 10 1 mult add

Figure 3. Simulated data with linear model form that includes additive and multiplicative noise.

0 1 2 3 4 5

a (mm)

â

0.6

0.5

0.4

0.3

0.2

0.1

0.0

Figure 4. True probability of detection (POD) curve available from 100 000 simulated observations.

0 1 2 3 4 5

a (mm)

“Tru

e” P

OD

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

Page 6: Developments in Probability of de Tec

60 M A T E R I A L S E V A L U A T I O N • J A N U A R Y 2 0 1 5

The process of the simulation is simply to samplefrom the “population” of observations, compute a90and a90/95, and repeat. In this case, this process wasdone for sample sizes of 30 and 100. Since theassumption of constant variance is violated, thesample data sets were converted to hit/miss foranalysis and a two-parameter logit model was used.a90 and a90/95 were computed for each simulateddata set and the coverage probability was computedfor both of the sample sizes. Based on experience, itis predicted that 100 observations of hit/miss datarandomly sampled should provide appropriatecoverage probability for POD (Annis, 2014).Appropriate coverage probability is defined as beingclose to the theoretical confidence value. For a 95%level of confidence, for example, the coverage proba-bility should be approximately 95%. Hence, thepercent of simulated intervals that cover the “true” a90will be calculated and evaluated against 95%. Figure 5shows box plots for a90 and a90/95 for the two-parameter logit model, which shows the variation inthe a90 and a90/95 values with respect to the truevalue of a90. As expected, the variance is higher for the estimated a90 and a90/95 values with only 30 observations. Though there does not appear to bemuch of a difference in the a90 values on average, thevariance for the 30-observation case is approximatelydouble that of the 100-observation case. The varianceof the a90/95 values for the 30-observation case isapproximately four times that of the 100-observationcase. Coverage probability for the 30-observation caseis approximately 85%, while coverage probability forthe 100-observation case is approximately 94%.

Hence, the 30-observation case yields more overlyoptimistic results about the inspection capability (that is, a90/95 < true a90).

These are the types of investigations that can beconducted via simulation studies. It is hoped that theywill be used more frequently as different techniques ofmodeling POD are proposed.

Summary and ConclusionThere have been considerable contributions to PODmodeling in the last decade. It is hoped that proce-dures for three-parameter and four-parameter modelsand also the bootstrapping approach for confidenceintervals will be codified for more general use.Simulation studies were introduced as a way of toquantify the effect of sample size on confidencebound calculations. In the future, such simulationstudies can be used to quantify the effect of changesin model form or different techniques for calculatingconfidence bounds. Further simulation studies caninvestigate the effect of the choice of left censoringthat data and the detection threshold on the coverageof the true a90. Hence, it is recommended that futurePOD software projects enable simulation studies tofacilitate such investigations. wx

AUTHORS

Jeremy S. Knopp: Air Force Research Laboratory(AFRL/RXCA), Wright-Patterson AFB, Ohio 45433.

Frank Ciarallo: Wright State University, Dayton, Ohio 45435.

Ramana V. Grandhi: Wright State University, Dayton, Ohio45435.

REFERENCES

Annis, C., and J.S. Knopp, “Comparing the Effectiveness ofa90/95 Calculations,” Review of Progress in QuantitativeNondestructive Evaluation, Vol. 26, 2007, pp. 1767–1774.

Annis, C., “Influence of Sample Characteristics on Probability of Detection Curves,” Review of Progress inQuantitative Nondestructive Evaluation, Vol. 33, 2014, pp. 2039–2046.

Berens, A.P., and P.W. Hovey, “The Sample Size and FlawSize Effects in NDI Reliability Experiments,” Review ofProgress in Quantitative Nondestructive Evaluation, Vol. 4,1985, pp. 1327–1334.

Berens, A.P., “NDE Reliability Data Analysis,” ASMHandbook, Vol. 17, 9th ed: Nondestructive Evaluation andQuality Control, 1989, pp. 689–701.

De Gryze, S., I. Langhans, and M. Vandebroek, “Using theCorrect Intervals for Prediction: A Tutorial on Tolerance Inter-vals for Ordinary Least-squares Regression,” Chemometricsand Intelligent Laboratory Systems, Vol. 87, No. 2, 2007,pp. 147–154.

DOD, MIL-HDBK-1823A, Nondestructive Evaluation SystemReliability Assessment, Department of Defense Handbook,U.S. Department of Defense, August 2010.

Harding, C.A., and G.R. Hugo, “Experimental Determinationof the Probability of Detection Hit/Miss Data for Small DataSets,” Review of Progress in Quantitative NondestructiveEvaluation, Vol. 22, 2003, pp. 1823–1844.

MEFEATURE wx developments in probability of detection modeling

Figure 5. Box plots for a90 and a90/95 values for 30 and 100 observations (obs) for two-parameter hit/miss model.

12

10

8

6

4

2

0

Rang

e

Probability of detection

True a90

a90

(30 obs)a

90/95

(30 obs)a

90

(100 obs)a

90/95

(100 obs)

Page 7: Developments in Probability of de Tec

J A N U A R Y 2 0 1 5 • M A T E R I A L S E V A L U A T I O N 61

Kass, R.E., and A.E. Raftery, “Bayes Factors,” Journal of theAmerican Statistical Association, Vol. 90, No. 430, 1995,pp. 773–795.

Knopp, J.S., and L. Zeng, “Statistical Analysis of Hit/miss Data,” Materials Evaluation, Vol. 71, No. 3, 2013, pp. 323–329.

Knopp, J.S, R.V. Grandhi, J.C. Aldrin, and L. Zeng, “Consider-ations for Statistical Analysis of Nondestructive EvaluationData: Hit/miss Analysis,” E-Journal of Advanced Mainte-nance, Vol. 4, No. 3, 2012, pp. 105–115.

Knopp, J.S., R.V. Grandhi, J.C. Aldrin, and I. Park, “StatisticalAnalysis of Eddy Current Data from Fastener Site Inspec-tions,” Journal of Nondestructive Evaluation, Vol. 32, 2013,pp. 44–50.

Koh, Y.M., and W.Q. Meeker, “Bayesian Planning of Hit-MissInspection Tests,” Review of Progress in QuantitativeNondestructive Evaluation, Vol. 33, 2014, pp. 2047–2054.

Kutner, M.H., C.J. Nachtsheim, J. Neter, and W. Li, AppliedLinear Statistical Models, 5th ed., McGraw-Hill/Irwin, NewYork, New York, 2004.

Li, M., F.W. Spencer, and W.Q. Meeker, “DistinguishingBetween Uncertainty and Variability in Nondestructive Eval-uation,” Review of Progress in Quantitative NondestructiveEvaluation, Vol. 31, 2012, pp. 1725–1732.

Li, M., F.W. Spencer, and W.Q. Meeker, “Quantile Probabilityof Detection: Distinguishing between Uncertainty Variabilityin Nondestructive Testing,” Materials Evaluation, Vol. 73,No. 1, 2015, pp. 89–95.

Moore, D.G., and F.W. Spencer, “Detection Reliability Studyfor Interlayer Cracks,” Proceedings of the 1998 SAEAirframe/Engine Maintenance & Repair Conference, Societyof Automotive Engineers, Inc., 1998.

Olin, B.D., and W.Q. Meeker, “Applications of StatisticalMethods to Nondestructive Evaluation,” Technometrics, Vol. 38, 1996, pp. 95–112.

Safizadeh, M.S., D.S. Forsyth, and A. Fahr, “The Effect ofFlaw Size Distribution on the Estimation of POD,” Insight,Vol. 46, No. 6, June 2004, pp. 1–5.

Spencer, F.W., “The Calculation and Use of ConfidenceBounds in POD Models,” Review of Progress in QuantitativeNondestructive Evaluation, Vol. 26, 2007, pp. 1791–1798.

Spencer, F.W., “Nonparametric Pod Estimation for Hit/missData: A Goodness of Fit Comparison for ParametricModels,” Review of Progress in Quantitative NondestructiveEvaluation, Vol. 30, 2011, pp. 1557–1564.

Spencer, F.W., “Curve Fitting for Probability of DetectionData: A 4–parameter Generalization,” Review of Progress in Quantitative Nondestructive Evaluation, Vol. 33, 2014,pp. 2055–2062.