(nasa_p.reese_j.harben)_implementing_strategies_for_risk_mitigation_(ncsli-2011)

25
2011 NCSL International Workshop and Symposium Implementing Strategies for Risk Mitigation In the Modern Calibration Laboratory Speaker: Jonathan Harben The Bionetics Corporation NASA Kennedy Space Center 1046 South Patrick Drive; M/S: ISC-6175 Patrick AFB, FL 32925 Phone: (321) 494-7907 Fax: (321) 494-5253 E-mail: [email protected] Authors: Jonathan Harben, Paul Reese Abstract Many strategies for risk mitigation have been employed in calibration laboratories. A modern look at these concepts is presented in terms of compliance to ANSI/NCSL and ISO standards. Specifically, the practical application of various techniques to manage false accept risk in a high production calibration lab is reviewed. Understanding the factors affecting the probability of incorrect Pass/Fail decisions reveals aspects which can be exploited and leveraged, reducing the effort required for compliance. Test Uncertainty Ratio (TUR) and End Of Period Reliability (EOPR) are directly related to risk, but one or both of these parameters may be impractical or impossible to quantify and manage in daily practice for every measurement process. TUR and EOPR are explained in terms of Z540.3 and these parameters are combined to form basic strategies for compliance. These concepts are investigated as well as their mathematical boundary conditions to gain a comprehensive understanding of practical, efficient risk mitigation. Bench level techniques alone may be impracticable and/or insufficient to assess the quality of a calibration program en masse, as it relates to measurement decision risk. Program level techniques can estimate or limit the false accept risk for future decisions, prior to obtaining any specific measurement result. Working practices and principals are presented that allow a modern calibration laboratory to meet the demand of customers and manage risk for multifunction instrumentation while maintaining compliance to national and international standards. 1. Background In the simplest terms, Measurement & Test Equipment (M&TE) owners submit an instrument to the calibration laboratory and want to know, “Is my equipment good or bad?” During a compliance test, M&TE is evaluated using laboratory standards to determine if it is performing as expected. This performance is compared within some pre-established specifications or tolerance limits requested by the end-user or customer. Often, these specifications are the manufacturer‟s published accuracy 1 [1] specifications. As customers, they are asking for an In- tolerance or Out-of-tolerance decision to be made. On the surface, this might appear to be a relatively straightforward request. But exactly what level of assurance is the customer receiving 1 The term accuracy is used throughout this paper to facilitate the classical concept of “uncertainty” for a broad audience. It is acknowledged that the VIM [1] defines accuracy as qualitative term, not quantitative, and that numerical values should not be associated with it

Upload: paul-reese

Post on 17-Aug-2015

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

Implementing Strategies for Risk Mitigation

In the Modern Calibration Laboratory

Speaker: Jonathan Harben

The Bionetics Corporation

NASA – Kennedy Space Center

1046 South Patrick Drive; M/S: ISC-6175

Patrick AFB, FL 32925

Phone: (321) 494-7907 Fax: (321) 494-5253 E-mail: [email protected]

Authors: Jonathan Harben, Paul Reese

Abstract

Many strategies for risk mitigation have been employed in calibration laboratories. A modern

look at these concepts is presented in terms of compliance to ANSI/NCSL and ISO standards.

Specifically, the practical application of various techniques to manage false accept risk in a high

production calibration lab is reviewed. Understanding the factors affecting the probability of

incorrect Pass/Fail decisions reveals aspects which can be exploited and leveraged, reducing the

effort required for compliance. Test Uncertainty Ratio (TUR) and End Of Period Reliability

(EOPR) are directly related to risk, but one or both of these parameters may be impractical or

impossible to quantify and manage in daily practice for every measurement process. TUR and

EOPR are explained in terms of Z540.3 and these parameters are combined to form basic

strategies for compliance. These concepts are investigated as well as their mathematical

boundary conditions to gain a comprehensive understanding of practical, efficient risk

mitigation. Bench level techniques alone may be impracticable and/or insufficient to assess the

quality of a calibration program en masse, as it relates to measurement decision risk. Program

level techniques can estimate or limit the false accept risk for future decisions, prior to obtaining

any specific measurement result. Working practices and principals are presented that allow a

modern calibration laboratory to meet the demand of customers and manage risk for

multifunction instrumentation while maintaining compliance to national and international

standards.

1. Background

In the simplest terms, Measurement & Test Equipment (M&TE) owners submit an instrument to

the calibration laboratory and want to know, “Is my equipment good or bad?” During a

compliance test, M&TE is evaluated using laboratory standards to determine if it is performing

as expected. This performance is compared within some pre-established specifications or

tolerance limits requested by the end-user or customer. Often, these specifications are the

manufacturer‟s published accuracy1 [1] specifications. As customers, they are asking for an In-

tolerance or Out-of-tolerance decision to be made. On the surface, this might appear to be a

relatively straightforward request. But exactly what level of assurance is the customer receiving

1 The term accuracy is used throughout this paper to facilitate the classical concept of “uncertainty” for a broad

audience. It is acknowledged that the VIM [1] defines accuracy as qualitative term, not quantitative, and that

numerical values should not be associated with it

Page 2: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

when statements of compliance are issued? How good are those statements of compliance? Is

simply reporting measurement uncertainty enough? What is the risk that a statement of

compliance is wrong? While alluded to in many international standards documents, this concept

is directly addressed in ANSI/NCSL Z540.3-2006 [2].

Since its publication, Z540.3 sub-clause 5.3b has, understandably, received a disproportionate

amount of attention compared with other sections in the standard [3, 4, 5]. This section

represents a significant change compared to its predecessor, Z540-1 [6]. Section 5.3b has come

to be known by many simply as “The 2 % Rule‖ and addresses calibrations involving

compliance tests, also called conformance tests, tolerance tests, or verification tests. It states:

―Where calibrations provide for verification that measurement quantities are within specified

tolerances, the probability that incorrect acceptance decisions (false accept) will result from

calibration tests shall not exceed 2% and shall be documented. Where it is not practicable to

estimate this probability, the test uncertainty ratio shall be equal to or greater than 4:1‖.

Many different inferences may be derived from these two seemingly innocuous statements

above. Familiarity with ISO-17025 [7] shows that, by comparison with Z540.3, subject matter

relating to compliance testing is relatively sparse as that standard is primarily focused on

reporting uncertainties with measurement results, similar to Z540.3 section 5.3a. Perhaps the

most significant reference to compliance testing in ISO-17025 is found in section 5.10.4.2

Calibration Certificates which requires that, “When statements of compliance are made, the

uncertainty of measurement shall be taken into account.” However, practically no guidance on

possible methods which could be implemented to take the measurement uncertainty into account

is provided. The American Association of Laboratory Accreditation (A2LA) further clarifies the

requirements associated with this concept in R205 [8]:

―when parameters are certified to be within specified tolerance, the associated uncertainty of the

measurement result is properly taken into account with respect to the tolerance by a documented

procedure or policy established and implemented by the laboratory that defines the decision

rules used by the laboratory for declaring in or out of tolerance conditions2‖.

Simply reporting the uncertainty along with a measurement result may not sufficiently satisfy

customer requirements where compliance tests are desired. Without a quantifiable control limit

such as false accept risk, this type of reporting may impart unknown risks to the customer.

Z540.3 addresses this concept directly.

2. Taking the Uncertainty Into Account

What does it mean to take the uncertainty into account and why it is necessary? For an intuitive

interpretation, refer to Figure 1. During a bench level compliance test, what are the decision rules

if uncertainty is taken into account? How might an item be declared to be In-tolerance when, in

reality, it is actually Out-of-tolerance? During the calibration, the UUT (Unit Under Test) might

2 The default decision rule is found in ILAC-G8:1996 [9], Guidelines on Assessment and Reporting of Compliance

with Specification, section 2.5. With agreement from the customer, other decision rules may be used as provided for

in this section of the Requirements

Page 3: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

legitimately be observed to be In-tolerance. However, the observation could be misleading; in

fact, it could be wrong.

Figure 1. Five Possible Bench Level Calibration Scenarios

In scenario #1, a reading on a laboratory standard voltmeter of 9.98 V can confidently lead to an

In-tolerance decision (Pass) for this 10 V UUT source with negligible risk. This is true due to

sufficiently small uncertainty in the measurement process and the proximity of the measured

value to the tolerance limit. Likewise, a non-compliance decision (Fail) resulting from scenario

#5 can also be made with high confidence, as the measured value of 9.83 V is clearly Out-of-

tolerance with little ambiguity. However, in scenarios #2, #3, and #4, this is not the case; a

Pass/Fail decision involves significant risk of being incorrect.

It is understood that all measurements are only estimates of the true value of the measurand; this

true value is clouded somewhat by the uncertainty of the measurement process. In scenarios, #2,

#3, & #4, this uncertainty provides the possibility for the true value of the measurand to lie either

In or Out of the UUT tolerance limits. The truth is not precisely known. Consider scenario #3,

where the UUT was observed at 9.90 V, exactly at the lower allowable tolerance limit. Under

such conditions, there is a 50 % probability that either an In-tolerance or Out-of-tolerance

decision will be incorrect, barring any other information3. In fact, even with the best available

standards with the lowest possible uncertainty, the probability of being incorrect will remain at

50 % in scenario #3 regardless of how small the uncertainty is. It is easy to see that a Pass/Fail

compliance decision carries some risk of being incorrect. This concept of bench level risk is

addressed in several documents [9, 10, 11, 12].

3 Bayesian analysis can result in false accept risk other than 50 % in such instances, where the a priori in-tolerance

probability (EOPR) of the UUT is known in addition to the measurement result and uncertainty. Such analysis is

beyond the scope of this paper.

9.92 V

9.90 V

9.87 V

9.98 V

9.83 V

9.75 V

9.80 V

9.85 V

9.90 V

9.95 V

10.00 V

10.05 V

10.10 V

10.15 V

10.20 V

#0 #1 #2 #3 #4 #5

Some Possible Measurement Results of a 10 V Source (UUT)

Upper Tolerance Limit

Lower Tolerance Limit

Expanded Uncertainty of the Measurement Process: ±0.05 V

UUT Set Voltage = 10.00 V

Page 4: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

It may be surprising to find that simple analysis of the individual measurement results presented

in the aforementioned scenarios is not directly consistent with the intent of “The 2 % rule” in

Z540.3, although it still has some application. To this point, the discussion has dealt exclusively

with bench level analysis of measurement decision risk. That is, risk was predicated only on

knowledge of the relationship between the UUT tolerance, the uncertainty of the measurement

process, and the observed measurement result made “on-the-bench”; after the measurement

result had been obtained. But computation of false accept risk (FAR, also known as the

Probability of False Accept, PFA, or consumer risk), for strict compliance with the 2 % rule in

Z540.3, does not depend on any particular measurement result nor its proximity to a given UUT

tolerance limit. The 2 % rule in Z540.3 addresses the risk prior to any particular single

measurement result being obtained, at the program level. To foster an understating of both

bench level and program level false accept risk among interested parties from a multitude of

backgrounds, this paper provides a more conceptual perspective of the intent underlying the 2 %

rule and its relationship to TUR and EOPR4.

3. The Answer to Two Different Questions

False accept risk describes the overall probability of false acceptance when Pass/Fail decisions

are made. False accept risk can be interpreted and analyzed at two different levels; the bench

level and the program level [4]. Both risk levels are described in ASME Technical Report

B89.7.4.1-2005 [13]. This report refers to bench level risk mitigation as “controlling the quality

of individual workpieces”, while program level risk strategies are described as “controlling the

average quality of workpieces”. These two approaches result in two answers to two different

questions, but they are related. In essence, it is not necessarily a matter of right and wrong, but

rather an appropriate answer to an appropriate question to meet a desired quality objective. It is

ambiguity in the question itself that may lead to different assumptions regarding the meaning of

false accept risk. Many international documents speak only to the bench level interpretation of

risk, requiring an actual measurement result to be available [9, 10, 11, 12]. However, Z540.3

was intended to address risk at the program level [14].

When Z504.3 requires that, “…the probability that incorrect acceptance decisions (false accept)

will result from calibration tests shall not exceed 2%..., it may not be immediately evident which

view point is being addressed, the bench level or the program level. However, the strategies for

implementing such a requirement are quite different depending upon the level at which the

requirement applies. The implications of this were significant enough to prompt NASA to

request interpretive guidance from the NCSLI 174 Standards Writing Committee [15]. It was

affirmed that the 2 % false accept requirement applies to a “population of „like calibration

sessions‟ or „like measurement processes‟” [14]. As such, Z540.3 section 5.3b does not directly

address the probability of false accept to any single, discrete measurement result or individual

workpiece and supports the program level view of risk prior to, and independent of, any actual

measurement result.

4 The subject of measurement decision risk includes not only the probability of false-accept (PFA), but the

Probability of Correct Accept (PCA), Probability of False Reject (PFR) and the Probability of Correct Reject (PCR).

While false-rejects can have significant economic impact to the calibration lab, the discussion in this paper is

primarily limited to the direct requirements of Z540.3 (5.3b), i.e. false accept risk.

Page 5: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

What exactly does this mean and how might risk be computed prior to any particular single

measurement result being obtained? In statistical terms, the 2 % rule refers to the unconditional

probability of false acceptance. Consider the outcomes of a collection or population of single

measurement results obtained over time. Each individual result does, by some small amount,

comprise the statistics which describe a population of “like calibration sessions or like

measurement processes.” Each measurement result (a Pass or Fail) contributes to the End Of

Period Reliability (EOPR) for a particular test-point, e.g. the 10 V, 1 kHz, AC voltage test-point

on a population of Acme model 123 digital multimeters. This EOPR data can help predict the

probability of making a false accept decision for future tests. Reliability data for the M&TE

model and manufacturer level can be used to conservatively estimate the reliability of the M&TE

test point. This is addressed in compliance Method 2 of the Z540.3 Handbook [16].

In terms of Z540.3, false accept risk describes the overall or average probability of false

acceptance decisions to the calibration program at large, for all of the Pass/Fail decisions that are

made at a particular test point. It does not represent risk associated with any particular individual

measurement result of any unique instrument. The 2 % rule speaks to the following question:

Given a historical collection of Pass/Fail decisions at a particular test-point for a population of

like-instruments (i.e. EOPR is known), what is the probability that an incorrect acceptance

decision will result during an upcoming test? Notice that no information is provided on any

particular measurement result. The question is being asked before the scheduled measurement is

ever made. This can be answered as long as previous EOPR data on the UUT population is

available and the measurement uncertainty (and thus TUR) is known. In certain circumstances,

it is also possible to comply with the 2 % rule by bounding or limiting false accept risk by using

either:

EOPR data without knowledge of the measurement uncertainty.

TUR without knowledge of EOPR data.

To understand how this is possible, a closer look at the relationship between false accept risk,

EOPR, and TUR is helpful.

4. End of Period Reliability (EOPR)

EOPR is the probability of a UUT test-point being In-tolerance at the end of its normal

calibration interval. It is sometimes known as In-tolerance probability and is derived from

previous calibration events, based directly on measured values. In its simplest form, EOPR can

be defined as

as-recieved

If it is known that a significant number of previous measurements for a population of UUTs were

very close to their tolerance limits “as-received”, would this affect the false accept risk for an

upcoming measurement? Yes; there would indeed be a higher false acceptance risk for the

planned (upcoming) measurement if it was known ahead of time that the measurement about to

be made was likely to lie near one of the UUTs‟ tolerance limits. Consider Figure 2: Two

different model UUT voltage sources are scheduled for calibration. One is a model-A and the

Page 6: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

other is a model-B. The five previous calibrations on model-A‟s have shown these units to be

highly reliable; see Group-A. Most often, they are found to be well within their tolerance limits

and easily comply with their specifications. Model-A‟s are “gems” in the parlance of the cal lab.

In contrast, previous calibrations of model B‟s have shown them to be “dogs”. Model-B‟s hardly

ever meet their specifications; see Group-B. Of the last five calibrations, two model-B‟s were

recorded as being Out-of-tolerance and one of them was “barely-in”. Making an In or Out of

tolerance decision will be a precarious judgment-call and there is a higher probability of making

a false accept decision compared to the model-A.

Figure 2. Previous Historical Measurement Data can Influence Future False Accept Risk.

In Figure 3, imagine the blue dot representing the measurement result is not shown on the chart

yet. If it was known ahead of time that this dot (the upcoming measurement result) was likely to

be near the tolerance limit when the measurement finally does occur, a false accept would indeed

be more likely. An elevated false accept risk would also be present if it was known ahead of

time that the measurement was likely to lie slightly outside the UUTs‟ tolerance limits.

Again, if there was a high probability that, based on previous measurement data, the

measurement result was likely to be in close proximity to the upper or lower tolerance limit, it

can easily be seen that the upcoming measurement would carry a larger false accept risk than if

the measurement was likely to lie near the nominal value (10 V). The critically important point

is this: If the historical reliability data indicates that In-tolerance probability (EOPR) of the UUT

is poor (up to a point5), the false accept risk is elevated.

5 Graphs of EOPR vs. false-accept risk can reveal a perceived decrease in false-accept risk as the EOPR drops below

certain levels. This is due to the large number of out-of-tolerance conditions that lie far outside (away from) the

UUT tolerance limits. This is discussed later in this paper.

9.75 V

9.80 V

9.85 V

9.90 V

9.95 V

10.00 V

10.05 V

10.10 V

10.15 V

10.20 V

10.25 V

#0 #1 #2 #3 #4 #5

Previous Measurement Results for Two Groups of Instruments

Upper Tolerance Limit

Lower Tolerance Limit

Expanded Uncertainty of the Measurement Process: ±0.05 V

UUT Set Voltage = 10.00 V

Group - A Group - B

Page 7: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

Figure 3. The Possibility of a False Accept for a Measurement Result

The previous scenarios assumed some familiarity with populations of similar instruments which

are periodically resubmitted for calibration. But how can EOPR be reconciled when viewed

from a “new” calibration laboratory‟s perspective? What if no historical reliability or EOPR data

exists for UUT‟s? Can a new laboratory even open its doors for business on day-one and still

meet the 2 % false accept requirement of Z540.3 without EOPR data? The answer to this

question is yes. However, the new lab must employ bench level techniques or other appropriate

techniques (boundary condition methods, guardbanding, etc.). Such methods are described later

in this paper. This same logic would apply to a well-established calibration laboratory, in

business for years, which receives a new, totally unique instrument for calibration the first time.

If no similar item exists, no historical data can be analyzed to determine EOPR and compute

false accept risk; therefore other appropriate techniques and/or bench level methods must be

employed.

If EOPR data or In-tolerance probability is important for compliance with the 2 % rule, how

good does the estimate of EOPR data have to be before program level methods can be used to

address false accept risk for a population of instruments? Undoubtedly, this will be the subject

of many discussions and papers to follow. Sharing or exchanging EOPR data between different

laboratories has even been proposed with varying opinions. Acceptance of this is generally

dependent upon the consistency of the calibration procedure used and the laboratory standards

employed. The rules used to establish EOPR data can be somewhat subjective (e.g. how many

samples are available, are first-time calibrations counted, are broken instruments included, are

late calibrations included, etc.). Other considerations include the validity of aggregating or

pooling like-instruments together by groupings and various classifications (e.g. model number).

9.92 V

9.85 V

9.87 V

9.89 V

9.91 V

9.93 V

9.95 V

9.97 V

9.99 V

10.01 V

10.03 V

10.05 V

#0 #1 #2 #3 #4 #5 Measured (Observed) Voltage = 9.92 V

UUT Voltage Source: Calibration Scenario

Lower Tolerance Limit

A False Accept Occurs if the True Value of the UUT is outside its tolerance limit

UUT Set Voltage = 10.00 V

Expanded Uncertainty of the Measurement Process

Page 8: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

5. Test Uncertainty Ratio

It has been illustrated that EOPR can greatly affect the false accept risk of calibration processes.

However, the concept of Test Uncertainty Ratio (TUR) is likely to be more familiar to the

broader audience as a metric of “quality” for calibrations. From the preceding examples, it is

evident that a lower uncertainty generally reduces the likelihood of a false accept decision.

Historically, TUR was thought of by many as a comparison of the quality (accuracy

specification) of a UUT to the quality (uncertainty) of a laboratory‟s reference standards and

processes. If a low TUR was computed, it often meant that laboratory standards were not good

enough to confidently make acceptance decisions during compliance tests, or that the UUT was

too good (accuracy spec was too tight). TUR has been viewed as the uncertainty or tolerance of

the UUT in the numerator divided by the uncertainties of the laboratory‟s measurement

standard(s) in the denominator [17]. A high ratio such as >4 to 1 was thought to be a quality

metric indicative of a robust calibration process.

The roots of TUR were born out of the Navy‟s Production Quality Division during the 1950‟s in

an attempt at limiting incorrect acceptance decisions. The origins of the ubiquitous 4 to 1 TUR

have been summarized by Dr. Howard Castrup of Integrated Sciences Group [18]. Early work in

measurement decision risk by the Navy‟s Jerry Hayes and Stan Crandon concluded:

“‟…if the measuring device had three times the accuracy of the UUT, the consumer's risk was

around 1.06 %. This assumes that the only error in the measurement process is the bias in the

measuring device. It also assumes a 95% in-tolerance probability for both the measuring device

and the UUT, i.e., that the specs for both devices are roughly ±2-sigma specs’… They decided to

pad the 3:1 ratio a bit to accommodate additional measurement process errors and to cover

cases where the in-tolerance probabilities would be less than 95%. They felt that a 4:1 ratio

would be sufficient‖.

In those early days, the above assumptions were necessary to ease the burden of rigorous

computational requirements of measurement risk analysis. Since that time, manufacturers‟

accuracy specifications have often been loosely inferred to represent ±2 sigma or 95 %

confidence for many implementations of TUR, unless otherwise stated. This is synonymous

with assuming that all UUT‟s will meet their published specifications 95 % of the time (i.e.

EOPR will be 95 %). Even if the calibration personnel did not realize they were making this

assumption when computing TUR, they were directly relying on such an assumption to gain any

utility out of the 4 to 1 TUR. For a simple TUR calculation to provide meaningful risk

mitigation, the assumptions stated above are critically important to the success of this simple

ratio tool. As previously demonstrated, the In-tolerance probability or EOPR does have a direct

affect on the false accept risk for populations of tests. Is the EOPR for all M&TE really 95 %?

That is, are all “specs” provided by manufacturers really representative of two standard

deviations of the true product distribution? If not, the time-honored 4 to 1 TUR will not provide

the expected level of protection for the consumer and false accept risk could be much higher than

expected. Calibration managers and quality personnel are well aware that all M&TE does not

have an EOPR of 95 %. It is often much lower.

Both the 4 to 1 TUR rule and the 2 % rule do not require information regarding individual

measurement results at the bench level nor how close these results might be to a particular

Page 9: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

tolerance limit. The measurement result of any particular test is not used for the computation of

the false accept risk intended by Z540.3, nor is a measurement result necessary for maintaining

TUR above 4 to 1. Both are program level risk mitigation efforts.

While the spirit of Z540.3 is to move away from the reliance on TUR altogether, its use is still

permitted if adherence to the 2 % rule is deemed “impracticable”. Use of the TUR is

discouraged due to the many assumptions it relies on for controlling risk. However, given that

the computation of false accept risk requires the somewhat laborious collection of EOPR data,

the use of TUR might be perceived as an easy way for labs to circumvent the 2 % rule. The

catch is the definition, or more poignantly the re-definition, of TUR. Section 3.11 in Z540.3

defines Test Uncertainty Ratio as:

“The ratio of the span of the tolerance of a measurement quantity subject to calibration, to twice

the 95% expanded uncertainty of the measurement process used for calibration”.

At first, this may not appear to be significantly different than other definitions currently in place

in many laboratories. One might notice the numerator, associated with the UUT, specifies the

span of the tolerance. If the tolerance specification of the UUT is two-sided and documented as

a plus-or-minus (±) tolerance, the entire span of the tolerance must be included. However, this is

countered by the requirement to multiply the 95 % expanded uncertainty of the measurement

process in the denominator by a factor of 2. The confidence level associated with the UUT

tolerance specifications in the numerator is undefined. This quandary is really nothing new.

Assumptions of the level of confidence associated with the UUT (numerator) have been made for

decades.

However, there is a very distinct difference between the TUR as defined in Z540.3 and some

widely accepted historical definitions. This difference centers on the components of the

denominator of the TUR. In Z540.3, the uncertainty in the denominator is very specifically

defined as the "uncertainty of the measurement process used in calibration." This definition has

broader implications than historical definitions by including elements of the UUT performance

(e.g., resolution, and process repeatability) in the denominator. These components of uncertainty

must be included in the 95 % expanded uncertainty of the measurement process, as used in the

TUR denominator. Many labs have long assumed that the uncertainty of the measurement

process, as it relates to the denominator of TUR, should encompass all aspects of the laboratory

standards, environmental effects, measurement processes, etc., but not aspects of the UUT.

Essentially, the TUR denominator was to reflect the capability of the laboratory to make highly

accurate measurements. The concept of “capability” is important here, as the capability of a

laboratory to make accurate measurements was sometimes viewed in the abstract sense, isolated,

removed, and independent of any aspects of the UUT. However, this is not the case with the

definition of expanded measurement uncertainty in the denominator of the Z540.3 TUR. All

influences, which affect the result of a measurement, should be included in the expanded

uncertainty. This includes the resolution, noise, and other possible contributions of the UUT

which can affect the laboratory‟s ability to accurately perform a measurement on a particular

device. This was reiterated to NASA in another response from the NCSLI 174 Standards Writing

Committee [19].

Page 10: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

The “new” definition of TUR is meant to serve as a single simplistic metric or barometer to

evaluate the plausibility of a proposed compliance test with regard to mitigating false accept risk.

No distinction is made as to where the risk originates - the UUT or the laboratory standard(s).

A low TUR does not necessarily imply that the laboratory standards are not “good enough”. It

may indicate, however, that the measurement cannot be made without significant false accept

risk due to limitations of the UUT itself (noise, resolution, etc.). Such might be the case where

the accuracy specification of a device is equal to its resolution or noise floor. This can prevent a

reliable Pass/Fail decision from being made.

When computing TUR, where specifications have been published at confidence levels other than

95 %, laboratories have sometimes attempted to convert these specifications into ±2 sigma specs

before dividing by the expanded uncertainty (2) of the measurement process. Or, equivalently,

UUT specs were converted to ±1 sigma specs for division by the standard uncertainty (1) of

the measurement process. Either way, this was believed (and still is by some) to provide a more

useful “apples-to-apples” ratio for the TUR. Efforts to develop an equivalent or normalized Test

Uncertainty Ratio have been documented by several authors [18, 20, 21, 22]. In many cases, the

EOPR (confidence level or In-tolerance probability) for a UUT is the single most influential

factor affecting the probability of false accept for a population of instruments. Simply making an

assumption (i.e. 95% confidence level) regarding this important reliability metric devalues the

usefulness of TUR calculations. In these situations, the validity or utility of risk mitigation is

greatly diminished. Its integrity then depends primarily on the level of effort, honesty, and

completeness demonstrated by the manufacturer that assigned the accuracy specifications to the

equipment. Were the accuracy specifications arrived at objectively by proper statistical analysis

of a population of instruments? Is the calibration interval for these specifications provided? Is

the calibration period reasonable? Are the tolerance limits conservative and highly reliable; or

were they arrived at by a marketing department or advertising group driven by other

motivations? Unless the TUR numerator and denominator represent similar confidence levels,

this simplistic ratio alone is of little value.

6. Understanding False Accept Risk

Both EOPR and TUR have been shown to influence false accept risk. A closer scrutiny of the

relationship between EOPR, TUR, and false accept risk can reveal some intriguing aspects that

are of great value to the modern calibration laboratory. Investigating the dependency of false

accept risk on EOPR and TUR requires some mathematical prowess, but is well worth the effort

involved. Details of the calculus operations are deferred to other papers which give an excellent

treatment of the subject matter. Several papers have already been published which expound upon

the mathematics behind the Z540.3 risk requirement at the program level [3, 4, 23, 24, 25].

These publications and many others build upon the seminal works on measurement decision risk

by Eagle, Grubbs, Coon, & Hayes [26, 27, 28] and should be considered required reading for

prospective organizations considering adoption of Z540.3.

As this discussion is intended to be more conceptual in nature, a brief overview of the

fundamental principles as they relate to conformance decisions is prudent. As stated earlier,

M&TE tolerance limits are often set by the manufacturer‟s published accuracy specifications,

establishing the maximum permissible error limits that the end-user expects the device to comply

with during the calibration interval. The device may be declared In-tolerance by the calibration

Page 11: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

laboratory if the UUT is observed to have a calibration result that is within the tolerance

limits L. This can be written as . The observed calibration result is related

to the actual or true device under test error and the measurement process error by the

equation .

Errors (such as and ), as well as measurement observations (such as ), are statistical

quantities represented by random variables and characterized by probability density functions.

These distributions represent the relative likelihood of any specific error ( and ) or

measurement observation ( ) actually occurring. The key concept is that the observation

is only an approximation of reality . The smaller the measurement uncertainty, the more

valid the approximation. Most often in metrology, these distributions are of the Gaussian form or

normal distribution and are described by two parameters, a mean or average , and a standard

deviation . The standard deviation is a measure of the variability or spread in the values from

the mean. In simplified cases, it is often assumed that the mean µ of all the possible error values

will be zero, unless their evidence to the contrary (i.e. a bias exists). Real-world measurements

are a function of both the UUT performance and the measurement uncertainty ,

where . The relative likelihood of all possible measurement results is

represented by the two dimensional surface area created by the joint probability distribution

given by Figure 4 graphically represents the concept of

probability density of measurement and assumes that measurement uncertainty and the UUT

distribution follow a normal or Gaussian probability density function.

Figure 4. Graphical Representation of the Probability Density of Measurement

The shape and the angle of Figure 4 changes as a function of TUR and EOPR. It is worth noting

that the quantity is the parameter being sought after when a calibration is performed, but

is what is obtained via the measurement process. At anytime, the precise value of is

always unknown due to the possibility of measurement process errors described by

Page 12: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

uncertainty . Therefore it is never possible to determine exactly. Computing an actual

numeric value for the probability (PFA or PFR) involves integrating a joint probability density

function over the appropriate two dimensional surface area defined by the limits stated below.

To compute probability, the input variables can be simplified to TUR and EOPR and the

mathematics represented symbolically as in Figure 5.

Figure 5. Conceptual Dependency of PFA on TUR and EOPR

Incorrect (false) acceptance decisions are made where the condition exists that:

and

The UUT is actually OUT of tolerance

and

The UUT is observed to be IN tolerance

Likewise incorrect (false) reject decisions are made where the condition exists that:

and

The UUT is actually OUT of tolerance

and

The UUT is observed to be IN tolerance

Integration over the entire joint probability region will yield a value of 1 as would be expected.

In the ideal case, if the measurement uncertainty was zero, the probability of measurement errors

occurring would be zero. The measurement observations would then reflect the behavior of

the UUT perfectly and the distribution of possible measurement results would be limited to the

distribution of actual UUT errors. That is, would equal and the graph in Figure

UUT is truly OUT of tolerance

and

UUT is observed to be IN tolerance

UUT is truly IN tolerance

and

UUT is observed to be OUT of tolerance

Page 13: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

4 would collapse to a straight line at a 45 degree angle, where the angle is given by

In this case, However, since real-world measurements are always hindered

by the probability of measurement errors, observations do not always reflect reality and false

accept risk results. The case of false accept and false reject is represented graphically in Figure 6.

Figure 6. Graphical Representation of False Accept and False Reject Decisions

7. Efficient Risk Mitigation

Early practices in metrology such as 4 to 1 TUR were based on assumptions and were necessary

to ease the burden of rigorous time consuming computational requirements of measurement risk

analysis. The limitations of early computing devices no longer exist and any modern computer is

capable of computing PFA using common mathematical tools such as MathCad®, Excel

®, or

commercially available risk analysis software. However, the collection, management, and

logistics of risk analysis can be quite cumbersome and time consuming, especially for

multifunction instrumentation.

To comply with Z540.3 (5.3b), the PFA must not exceed 2 %. However, computing an actual

numeric value for PFA is not necessarily required in order to comply with the 2 % rule. To

understand how this is possible, the boundary conditions of PFA can be investigated by varying

the TUR and EOPR over a wide range of values and observing the resultant PFA. This is best

illustrated by a three dimensional surface plot, where the x and y axis represent TUR and EOPR,

and the height of the surface on the z-axis represents PFA. See Figures 7 & 8.

9.80 V

9.85 V

9.90 V

9.95 V

10.00 V

10.05 V

10.10 V

10.15 V

#0 #1 #2 #3 #4 #5

Vo

ltag

e

Calibration Scenarios

Possible Measurement Results of a 10 V Source (UUT)

Measured Value (Observed)

UUT Actual (True)

Upper Tolerance Limit (+L )

Lower Tolerance Limit (-L)

UUT Nominal Voltage

euut eobs

estd

False Reject False Accept

euut eobs

estd

eobs = euut + estd

Page 14: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

Figure 7. Topographical Contour Map of False Accept Risk as a Function of TUR and EOPR

Figure 8. Surface Plot of False Accept Risk as a Function of TUR and EOPR

Page 15: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

This surface plot combines both aspects affecting false accept risk into one visual representation

that can further illustrate the relationship between the variables TUR and EOPR. One curious

observation is that the PFA can never be greater than 13.6 % for any combination of TUR and

EOPR. That is, all calibration processes result in a PFA of less than 13.6 %, regardless of how

low the TUR is and how low the EOPR is. No calibration scenario can be proposed which will

result in a PFA greater than 13.6 %. This maximum value of 13.6 % PFA results when the TUR

is approximately 0.3 to 1 and the EOPR is 41 %. Any change, higher or lower, for either the

TUR or EOPR will result in a PFA lower than 13.6 %.

One particularly useful observation is that, for all values of EOPR, the PFA never exceeds 2 %

when the TUR is above 4.6 to 1. That is, regardless of what the actual EOPR might be for a

UUT, the PFA has an upper boundary condition of 2 % as long as the TUR is greater than 4.6 to

1. Notice in Figure 8 that the darkest blue region of the PFA surface is always below 2 %. Even

if the TUR axis in the above graph is extended to infinity, the darkest blue PFA region would

continue to fall below the 2 % threshold. Calibration laboratory managers will find this an

efficient risk mitigation technique for compliance with Z540.3. This fact can eliminate the

burden of collecting, analyzing, and managing EOPR data in circumstances where the TUR has

been evaluated and shown to be greater than 4.6 to 1.

This concept can further be illustrated if the perspective (viewing angle) of the above surface

plot in Figure 8 is rotated. This allows the two dimensional maximum outer-envelope to be easily

viewed. With this perspective, PFA can be plotted only as a function of TUR (Figure 9). In this

instance, the worst-case EOPR is used whereby the maximum PFA is produced for each TUR.

Figure 9. Worst Case False Accept Risk vs. TUR

0 %

2 %

4 %

6 %

8 %

10 %

12 %

14 %

16 %

0 1 2 3 4 5 6 7

Pro

bab

ility

of

Fals

e A

ccep

t (R

isk)

Test Uncertainty Ratio (TUR)

Max Risk vs TUR (Assumes Worst-Case EOPR for a given TUR)

False Accept Risk is always below 2 % for

TUR ≥ 4.6 to 1

Page 16: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

There is no EOPR which would yield a PFA above 2 % for TUR‟s greater than 4.6 to 1.

Therefore, whatever the EOPR is for items exhibiting TUR above 4.6 to 1, it is adequate and

need not be investigated to ensure that PFA is less than 2 %. The left-hand side of the graph in

Figure 9 might not appear intuitive at first. Why would the PFA suddenly decrease as the TUR

drops below 0.3 to 1 and approaches zero? While a full explanation is beyond the scope of this

paper, the answer lies in the number of items rejected (falsely or otherwise) when extremely low

TUR exists.

Another benefit of examining the boundary conditions of the surface plot can be realized by

noting that the PFA is always below 2 % where the true EOPR is greater than 95 %. This is true

regardless of how low the TUR is. Even cases with extremely low TUR‟s (even below 1:1) will

always produce a PFA less than 2 % where the true EOPR exceeds 95 %. Again, if the

perspective of the PFA surface plot in Figure 8 is properly rotated, a two dimensional outer-

envelope is produced whereby PFA can be plotted only as a function of EOPR (Figure 10). In

this case, the worst-case TUR is used, maximizing the PFA. This results in a graph of PFA that is

a function only of the EOPR, with each instantaneous point computed at the TUR which

maximizes the PFA. In other words, a “worst-case” TUR has been assumed at each and every

point on the curve below. This curve represents the absolute worst possible PFA for any given

EOPR and knowledge of the TUR is not required.

Figure 10. Worst Case False Accept Risk vs. EOPR

0%

2%

4%

6%

8%

10%

12%

14%

16%

0 % 10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % 100 %

Pro

bab

ility

of

Fals

e A

ccep

t (R

isk)

True End Of Period Reliability (EOPR)

Max Risk vs EOPR (Assumes Worst-Case TUR for a given EOPR)

False Accept Risk is always below 2 % for

true EOPR ≥95 %

Page 17: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

Figure 11. EOPR Data – Three Cases (Poor, Moderate, Excellent)

As was the case of low TUR, a similar phenomenon is noted on the left-hand side of the graph in

Figure 10; the maximum PFA decreases for true EOPR values below 41 %. As EOPR

approaches zero on the left side, most of the UUT values lie far outside of the tolerance limits.

When the values are not in close proximity to the tolerance limits, the risk of falsely accepting an

item is low. Likewise on the right-hand side of the graph, where the EOPR is very good (near

100 %), the false accept risk is low. Both ends of the graph represent areas of low PFA because

most of the UUT values have historically been found to lie far away from the tolerance limits,

either significantly Out-of-tolerance (left side), or significantly In-tolerance (right side). The

PFA is highest, in the middle of the graph, where EOPR is only moderately poor and much of the

data is found near the tolerance limits. Refer to Figure 11.

8. True vs. Observed EOPR

Until now, this discussion has been limited to the concept of “true” EOPR. This caveat deserves

further attention. The idea of a true EOPR implies that an immaculate value for reliability exists,

which has not been influenced by any non-ideal factors. However, as with all empirical data,

this is not the case. In the calibration laboratory, reliability data is collected from a history of

real-world observations or measurements. These observations of UUT‟s are made using actual

equipment – often expensive reference standards with very low uncertainty under controlled

conditions. But even the best available standards have some finite uncertainty and the UUT

itself often contributes noise and other undesirable effects. All of this uncertainty impinges on

the integrity of compliance decisions, manifesting in observed reliability data that is not a true

reflection of reality. The measurement uncertainty “contaminates” the EOPR data to some

degree, calling into question its validity. The observed EOPR is never a completely accurate

representation of the true EOPR.

Region 3 EOPR = 100 %

Region 2 EOPR = 50 % Region 1

EOPR = 20 %

9.75 V

9.80 V

9.85 V

9.90 V

9.95 V

10.00 V

10.05 V

10.10 V

10.15 V

10.20 V

Vo

ltag

e

Low Risk High Risk Low Risk

EOPR Data

UUT As Received Condition

L

-L

Page 18: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

The difference between the observed and true EOPR becomes more pronounced as the

measurement uncertainty increases (i.e. TUR drops). A low TUR can result in a significant

deviation between what is observed and what is true regarding the reliability data. This concept

has been presented in some depth previously [23, 29, 30, 31]. The reported or observed EOPR

from calibration history includes all influences from the measurement process. In this case, the

standard deviation of the observed distribution is given by

where

and are derived from statistically independent events. The corrected or “true standard

deviation” can be approximated by removing the effect of measurement uncertainty and solving

for

where is the “true” distribution width represented by standard

deviation.

The above equation shows that the standard deviation of the observed EOPR data is always

worse (higher) than the true EOPR data. That is, the reliability history maintained by a

laboratory will always cause the UUT data to appear further dispersed than what is actually true.

This results in an 89 % observed EOPR boundary condition where the PFA is less than 2 % for

all possible values of TUR6. See Figure 12 below.

Figure 12. PFA Assumes Worst Case TUR for True EOPR and Observed EOPR.

If measurement uncertainty is thought of as “noise”, and the EOPR is the measurand, then the

observed data will have greater variability or scatter than what the true value of the EOPR really

6 When correcting EOPR under certain conditions, low TUR values can result in imaginary values for uut. This can

occur where uut and std are not statistically independent and/or the levels of confidence associated with std and/or

uut have been misrepresented.

0 %

2 %

4 %

6 %

8 %

10 %

12 %

14 %

16 %

0 % 20 % 40 % 60 % 80 % 100 %

Pro

bab

ility

of

Fals

e A

ccep

t (R

isk)

End of Period Reliability

Max False Accept Risk vs EOPR (Assumes Worst-Case TUR for a given EOPR)

Observed

TRUE

-

-

-

False Accept Risk always below 2 % for observed EOPR ≥89 %

Page 19: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

is. Measurement uncertainty always hinders the quest for accurate data; it never helps. It should

be noted that the true value of a single data point may indeed be higher or lower than the

measured value, due to uncertainty. For any single instance, it is never known whether the

measurement uncertainty contributed an additive error or negative error. Therefore, it is not

possible to remove the effect of measurement uncertainty from a single measurement result. But

EOPR data is a historical collection of many Pass/Fail compliance decisions which can be

represented by a normal probability distribution with a standard deviation . Sometimes the

measurement uncertainty will contribute positive errors and sometimes it will contribute

negative errors. If the mean of these errors is assumed to be zero, the effect of measurement

uncertainty on a population of EOPR data can be removed as previously shown. The inverse

normal function is used to estimate from observed EOPR data [32].

where -1

represents the inverse normal distribution.

EOPR is a numerical quantity arrived at by statistical means applied to empirical data –

analogous to a Type A estimate in the language of the GUM [33]. The data comes from repeated

measurements made over time rather than accepting manufacturers‟ claims at face value -

analogous to Type B or heuristic estimates. However, the influence of the measurement process

is always present. This method of removing measurement uncertainty from the EOPR data is a

best estimate of the true reality or reliability which is sought through measurement.

There are many other factors that affect EOPR. For items which are routinely re-submitted to the

calibration lab, shortening or lengthening the calibration interval will affect EOPR. Laboratories

which are presently ISO-17025 compliant may not currently have the mechanisms in place to

determine and manage calibration intervals or adjustment policies. Such a policy has a direct

influence on the probability of false acceptance for the population of instruments. A

laboratory‟s adjustment policy can also directly affect the false accept risk to individual

instruments and to the population at large. Laboratories which require the technician to adjust or

align an instrument exhibiting an error, which exceeds some established percentage of its

allowable tolerance, can control false accept risk at the bench level and the program level. An

adjustment policy affects the false accept risk to the population of instruments by virtue of

affecting the In-tolerance probability or EOPR of all similar instruments.

To summarize, the 2 % PFA maximum boundary condition, formed by either 4.6 to 1 TUR or

89 % observed EOPR, can greatly reduce the effort and labor required for the modern calibration

laboratory in managing false accept risk for a significant portion of the M&TE submitted for

calibration. For some labs, obtaining EOPR data might be the most burdensome task, while

TUR could be the most difficult parameter for other labs to produce. The PFA boundary

condition can be leveraged from either perspective, providing benefit to practically all

laboratories. However, there will still be instances where the TUR is lower than 4.6 to 1 and the

observed EOPR is less than 89 %. In these instances, it is still possible for the PFA to be less

than 2 %. In these cases, a full PFA calculation is required to compute risk to show the 2 %

Page 20: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

requirement has not been exceeded. However, other techniques can be employed to ensure that

the PFA is held below 2 %.

9. Guardbanding

During a compliance decision, it is sometimes helpful to establish acceptance limits A at the

time-of-test that are more stringent (tighter) than the manufacturers tolerance limits L.

Acceptance limits are often called guardband limits or test-limits. These time-of-test constraints

are only imposed during the calibration process when making compliance decisions in order to

reduce the risk of a false acceptance. It is only necessary to implement acceptance limits A,

which differ from the tolerance limits L, when the false accept risk is higher than desired or as

part of a program to keep risk below a specified level. Acceptance limits may be chosen to

mitigate risk at the bench level or program level. PFA calculations may be used to establish

acceptance limits based on the mandated risk requirements. In most instances, where guard

bands are applied, the tolerance limits are temporarily “tightened” or reduced to create

acceptance limits needed to meet a PFA goal. The subject of guardbanding is extensive and

novel approaches exists for computing and establishing acceptance limits to mitigate risk, even

where EOPR data is not available [25]. However, in the simplified case of no guardbanding, the

acceptance limits A are set equal to the tolerance limits L (A = L).

The Z540.3 Handbook references six possible methods to achieve compliance with the 2 % rule

[16]. One particularly useful method employing a guardbanding technique is described in

Method 6 of the Handbook [16, 25]. Strictly speaking, this method does not require EOPR data

to be available because the method relies on using worst-case EOPR, computed for a specified

TUR value. Using this approach, a guardband multiplier is computed as a function of TUR. The

acceptance limits are expressed as follows: , where is the newly established

acceptance limits, is the original tolerance limits, is the expanded measurement process

uncertainty, and is multiplying factor that yields a risk of a specified maximum target. Figure

13 plots guardband multipliers for varying levels of risk. The risk level for Z540.3 is specified

at 2% but could be a different value depending on the agreement with the customer. was

previously calculated by Dobbert [25] by fitting a line though the data points that mitigate risk to

a level of 2 % and is given by the following simplified formula.

It can be seen that the line is a good fit for the condition where . The intent was to

keep the equation simple for ease of use but cover the appropriate range of TUR values that

make physical sense. It has been shown in this paper that for .

To verify this we can set and solve for TUR and we indeed find that is a

boundary condition. It is worth noting that, for . This

implies that a calibration lab could actually increase the acceptance limits A beyond the UUT

tolerances L and still be compliant with the 2 % rule. While not a normal operating procedure

for most calibration laboratories, setting guard band limits outside the UUT tolerance limits is

conceivably possible while maintaining compliance with the program level risk requirement of

Z540.3. In fact, laboratory policies often require items to be adjusted back to nominal for

observed errors greater that a specified portion of their allowable tolerance limit L.

Page 21: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

Figure 13. Guardband Multiplier for Acceptable Risk Limits as a Function of TUR

10. Conclusion and Summary

Organizations must determine if risk is to be controlled for individual workpieces at the bench

level, or mitigated for the population of items at the program level7. Where ISO-17025 was not

specific in methodologies to “account for the uncertainty of measurement”, Z540.3 provides for

detailed treatment of the uncertainty during compliance testing and much more. Z540.3 points to

ISO-17025 as suitable for the core competency of calibration labs, but it also levies the

requirements of section 5.3. For the calibration program, it places an upper limit on the

probability associated with the risk of incorrectly accepting measurements as In-tolerance during

compliance decisions.

ISO-17025 contains no specific provision for utilizing a TUR as a method to account for the

uncertainty, while Z540.3 does allow the limited use of a 4 to 1 TUR. However, under Z540.3, a

4 to 1 TUR is provided only as a secondary alternative to the preferred method of limiting PFA

to less than 2 % and, even then, only when calculating PFA is “impracticable”. However, the

indiscriminate blanket use of 4 to 1 TUR, in lieu of calculating and limiting the PFA, is not

acceptable for compliance with Z540.3. It must first be demonstrated that the more rigorous,

primary method of limiting false accept risk to <2 % is not practicable. From this perspective,

laboratories which have operated in compliance with the 4 to 1 Test Accuracy Ratio (TAR)

requirement of ANSI/NCSL Z540-1 [6] cannot simply transition to Z540.3 by defaulting to the 4

to 1 TUR provision in Z540.3. Moreover, the definition differences in the Z540-1 TAR

7 Bayesian analysis can be performed to determine the risk to an individual workpiece using both the measured

value on the bench and program-level EOPR data to yield the most robust estimate of false accept risk. Such

discussions are deferred to other publications [32].

-175 %

-150 %

-125 %

-100 %

-75 %

-50 %

-25 %

0 %

25 %

50 %

75 %

0 2 4 6 8 10 12 14

Gu

ard

ban

d M

ult

iplie

r (M

)

Test Uncertainty Ratio (TUR)

Guardband Multiplier vs TUR

2 % PFA

1 % PFA

3 % PFA

5 % PFA

Dobbert 2008

Page 22: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

requirement and the Z540.3 TUR requirement would also pose significant challenges to the

transition process. Compliance with the 2 % rule in Z540.3 must be accomplished by calculating

PFA and/or limiting its probability to less than 2 %.

Computation of PFA requires the use of double integral calculus formulas of joint probability

density functions, the solutions of which are not trivial. The input variables to these formulas

can be reduced to EOPR and TUR. While expanded uncertainties (and possibly TUR‟s) may be

well documented for most labs which are already in compliance with ISO-17025, it is EOPR that

is of pivotal importance to organizations considering the adoption of Z540.3. For the purposes

of complying with the 2 % rule in Z540.3, properly calculating PFA requires the availability of

EOPR data.

While some calibrations laboratories currently maintain historical EOPR data, these metrics are

almost universally retained at the equipment-level (i.e. did the M&TE instrument as a whole,

Pass or Fail?). For multi-parameter, multi-test-point instruments, a single failure of one test-

point represents a “Fail” at the instrument-level. Historical data at many laboratories simply

indicates whether the instrument Passed or Failed; not which specific test-point was Out-of-

tolerance. Therefore, EOPR history data or metrics are not readily available at the test-point-

level for the vast majority of calibration laboratories. This can, however, be used as a

conservative estimate of EOPR for PFA evaluation. Even for laboratories which maintain a

comprehensive database of all measurement data ever acquired for all M&TE calibrated at the

test-point-level, these results are stored in a plethora of different manners and systems including

handwritten, hardcopy, paper datasheets. Obtaining and/or extracting this data for the

computation of PFA represent an enormous logistical challenge with a correspondingly high

cost. Moreover, if laboratories do not already posses this EOPR history, many years of

calibration events may be required to obtain the reliability data necessary for the calculation of

PFA. Therefore, efficient mitigation strategies are required.

There are six methods listed in the Z540.3 Handbook for complying with the 2 % false accept

risk requirement [16]. It should be noted that this paper has specifically focused on some efficient

approaches to this objective. This does not, in any way, negate the use of other methods nor

does it imply that the ones discussed here are necessarily the best methods for any particular

laboratory or program. It presents some efficient methods for “taking the uncertainty into

account” and mitigating risk, even where a numeric value for the measurement uncertainty or

EOPR might be unknown. Basic strategies for handling risk without rigorous computation of

PFA are:

Analyze EOPR data. This will most likely be done at the instrument-level, as opposed to

the test-point level, depending on data collection methods. If the observed EOPR data

meets the required level of 89 %, then the 2 % PFA rule has been satisfied.

If this is not the case, then further analysis is needed. TUR must be determined at each

test point. If the analysis reveals that the TUR is greater than 4.6 to 1, no further action is

necessary and the 2 % PFA rule has been met.

If neither the EOPR nor TUR threshold is met, a Method #6 guardband can be applied.

Page 23: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

Compliance with the 2 % rule can be accomplished by either calculating PFA and/or limiting its

probability to less than 2% by the methods presented above. If these methods are not sufficient,

other strategies must be used. Beyond guardbanding, alternative methods of mitigating PFA are

available [16].

Determine if the acceptable tolerance can be expanded and still meet customer

requirement. This is sometimes known as a Limited Calibration to de-rated

specifications.

Determine if the measurement uncertainty of the calibration process can be improved

perhaps by changing calibration equipment, calibration method, etc. Shortening the

calibration interval may improve EOPR and reduces risk over time.

However a poorly specified UUT, which is being tested for compliance to overly optimistic

specifications, presents a serious problem for the calibration lab and for the end-user of M&TE.

In some circumstances, shortening the calibration interval, implementing conservative guard-

bands, and applying rigorous laboratory adjustment policies cannot render a “dog” into a “gem”.

Occasionally, no amount of effort or action on the part of the calibration laboratory can force a

UUT to comply with unrealistic expectations of accuracy and performance. Contacting the

manufacturer with this evidence may result in the issuance of revised/amended specifications of

a more realistic nature.

Assumptions, approximations, estimations, and uncertainty have always been part of metrology

and will continue to be present for all of time; there are few guarantees or absolutes. Practically

no process can guarantee that instruments will provide the desired accuracy, or will function

within their assigned tolerances during any particular application or use. However, through a

well managed calibration process, confidence can be attained that an instrument will perform as

expected and within limits. Quantification of this confidence can be realized via analysis of

uncertainty, EOPR, and false accept risk. Reducing the number of assumptions and improving

the estimations involved during calibration can provide higher levels of confidence, reduced risk,

and improved quality. Historical reliance on several key assumptions has limited the

effectiveness of time-honored practices to manage risk. Z540.3 provides the organizational

framework to advance the state of metrology beyond these conventional limitations and efficient

methods exists for compliance with this standard.

Acknowledgement

The authors would like to thank the many people who contributed to an understanding of the

subject matter presented here. Specifically, the contributions of Perry King (Bionetics), Scott

Mimbs (NASA), and Jim Wachter (Millennium Engineering and Integration) at Kennedy Space

Center were invaluable. Several graphics were generated using PTC‟s MathCad® 14. Where

numerical methods were more appropriate, Microsoft Excel® was used incorporating VBA

functions developed by Dr. Dennis Jackson of the Naval Surface Warfare Center in Corona, CA.

Page 24: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

References

1. JCGM 200:2008 (VIM). International vocabulary of metrology — Basic and general concepts

and associated terms (VIM). Joint Committee for Guides in Metrology - Working Group 2, 2008.

2. ANSI/NCSL Z540.3:2006. Requirements for the Calibration of Measuring and Test

Equipment. American National Standards Institute / NCSL International, 2006.

3. D. Deaver, J. Somppi. A Study of and Recommendations for Applying the False Acceptance

Risk Specification of Z540.3. Proceedings of the Measurement Science Conference, Anaheim

CA, 2010.

4. H. Castrup, Integrated Sciences Group. Risk Analysis Methods for Complying with Z540.3.

Proceedings of the NCSL International Workshop and Symposium, St. Paul Minnesota, 2007.

5. M. Dobbert. A Pragmatic Method for Pass/Fail Conformance Reporting that Complies with

ANSI Z540.3, ISO 17025, and ILAC-G8. Proceedings of the NCSL International Workshop and

Symposium, Providence RI, 2009.

6. ANSI/NCSL Z540.1 1994 (R2002). Calibration & Measurement & Test Equipment - General

Requirements. American National Standards Institute / National Conference of Standards

Laboratories, ISBN: 9781584640035, 2002

7. ISO/IEC 17025:2005(E). General requirements for the competence of testing and calibration

laboratories. International Organization for Standardization / International Electrotechnical

Commission, 2005.

8. A2LA:R205. Specific Requirements: Calibration Laboratory Accreditation Program.

American Association for Laboratory Accreditation, 10 May 2011.

9. ILAC-G8:1996. Guidelines on Assesment and Reporting of Compliance with Specification

(based on measurements and tests in a laboratory). International Laboratory Accreditation

Cooperation, 1996 (Revised in 2009)

10. UKAS:M3003. The Expression of Uncertainty and Confidence in Measurement (Appendix

M). United Kingdom Accreditation Service, 2nd Edition, Jan 2007.

11. ASME B89.7.3.1-2001. Guidelines for Decision Rules: Considering Measurement

Uncertainty in Determining Conformance to Specifications. American Society of Mechanical

Engineers, ISBN: 0791827429, Jan 2001

12. ISO-14253-1:1998(E). Geometrical Product Specifications (GPS) - Inspection by

measurement of workpieces and measuring equipment - Part 1: Decision Rules for proving

conformance or non-conformance with specifications. International Organization for

Standardization, First Edition, 15 November 1998.

13. ASME B89.7.4.1-2005 (Technical Report). Measurement Uncertainty Conformance Testing:

Risk Analysis. American Society of Mechanical Engineers, 03 February 2006.

14. NCSLI. Response to NASA Interpretation Request (IR2). NCSL International, 174 Standards

Writing Committee, 06 March 2008.

15. S. Mimbs. Request for Interpretation; Correspondence Letter. NASA - John F. Kennedy

Space Center (SA-G), 30 July 2007.

Page 25: (NASA_P.Reese_J.Harben)_Implementing_Strategies_for_Risk_Mitigation_(NCSLI-2011)

2011 NCSL International Workshop and Symposium

16. ANSI/NCSL Z540.3 Handbook. Handbook for the Application of ANSI/NCSL Z540.3:2006 -

Requirements for the Calibration of Measuring and Test Equipment. American National

Standards Institute / NCSL International, 2009.

17. J. Bucher (Editor). The Metrology Handbook. American Society for Quality, Measurement

Quality Division, ASQ Quality Press, ISBN 0-87389-620-3, 2004.

18. H. Castrup, Integrated Sciences Group. A Note on the Accuracy Ratio Requirements, 2006.

19. NCSLI. Response to NASA Interpretation Request (IR1). NCSL International, 174 Standards

Writing Committee, 06 March 2008.

20. M. Nicholas, L. Anderson. Guardbanding Using Automated Calibration Software.

Proceedings of the NCSL International Workshop and Symposium, Salt Lake City UT, 2004.

21. Fluke Corporation. Calibration: Philosopy in Practice. ISBN: 978-0963865007, May 1994.

22. T. Skwircznski. Uncertainty of the calibrating instrument, confidence in the measurement

process and the relation between them. International Organization of Legal Metrology , OIML

Bulletin, Vol XLII, No.3, July 2001.

23. NASA-HDBK-8739.19-4. Estimation and Evaluation of Measurement Decision Risk, NASA

Measurement Quality Assurance Handbook – ANNEX 4. July 2010.

24. M. Dobbert. Understanding Measurement Decision Risk. Proceedings of the NCSL

International Workshop and Symposium, St. Paul Minnesota, 2007.

25. M. Dobbert. A Guard Band Strategy for Managing False Accept Risk. Proceedings of the

NCSL International Workshop and Symposium, Orlando FL, 2008, .

26. A. Eagle. A Method for Handling Error in Testing and Measuring. Industrial Quality

Control, March 1954.

27. F. Grubbs, H. Coon. On Setting Test Limits Relative to Specification Limits. Industrial

Quality Control, March 1954.

28. J. Hayes, Technical Memorandum No. 63-106. Factors Affecting Measuring Reliability. U.S.

Naval Ordnance Laboratory, Measurements Reliability Branch, Production Quality Division,

Missle Evaluation Dept, 24 October 1955.

29. S. Mimbs. Measurement Decision Risk - The Importance of Definitions. Proceedings of the

Measurement Science Conference, Anaheim, CA, 2008.

30. J. Ferling. The Role of Accuracy Ratios in Test and Measurement Processes. Proceedings of

the Measurement Science Conference Proceedings, Long Beach CA, 1984.

31. S. Mimbs. Using Reliability to Meet Z540.3’s 2% Rule. Proceedings of the NCSL

International Workshop and Symposium, National Harbor MD, 2011.

32. H. Castrup, Integrated Sciences Group. Analytical Metrology SPC Methods for ATE

Implementation. Proceedings of the NCSL Workshop and Symposium, Albuquerque NM, 1991.

33. ISO/IEC Guide 98-3:2008. Uncertainty of measurement -- Part 3: Guide to the expression of

uncertainty in measurement (GUM:1995). International Organization for Standardization /

International Electrotechnical Commission, 2008.