instrument adjustement policies

Upload: paul

Post on 27-Feb-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/25/2019 Instrument Adjustement Policies

    1/44

    2015 NCSL International Workshop & Symposium

    Instrument Adjustment Policies

    Speaker/Author: Paul Reese

    Baxter Healthcare Corporation

    25212 West Illinois Route 120

    Mail Stop: WG2-2SRound Lake, IL 60073

    Phone: (224) 270-4547 Fax: (224) 270-2491

    E-mail:[email protected]

    Abstract

    Instrument adjustment policies play a key role in the reliability of calibrated instruments tomaintain their accuracy over a specified time interval. Periodic review and adjustment of

    assigned calibration intervals is required by national standard ANSI/NCSL Z540.3 and is

    employed to manage the End of Period Reliability (EOPR) to acceptable levels. Instrument

    adjustment policies may also be implemented with various guardband strategies to manage falseaccept risk. However, policies and guidance addressing the routine adjustment of in-tolerance

    instruments are not so well established. National and international calibration standards

    ANSI/NCSL Z540.3 and ISO/IEC-17025 do not mandate any particular adjustment policy withregard to in-tolerance equipment. Evidence has been previously presented where routine

    adjustment of in-tolerance items may even degrade performance. Yet, this important part of the

    overall calibration process is often left to the discretion of the calibrating technician based onheuristic assessment. Astute adjustment decisions require knowledge of the random vs.

    systematic nature of instrument error. Instruments dominated by systematic effects, such as drift,

    benefit from adjustment, while those displaying more random behavior may not. Monte Carlo

    methods are used here to investigate the effect of various adjustment thresholds on in-tolerance

    instruments.

    1. Background

    Instrument adjustment policies during calibration vary among different organizations. Suchpolicies can generally be classified into one of three categories:

    1) Adjust always2) Adjust only if Out-Of-Tolerance (OOT)3) Adjust with discretion when In-Tolerance and always when OOT

    While the first two polices are essentially self-explanatory, the third category deserves further

    attention. Herein, a discretionary adjustment is one in which the calibration technician (orsoftware) makes a decision to adjust an instrument, which is observed to be in-tolerance, basedon consideration of additional factors. Discretionary adjustment may sometimes be performed in

    conjunction with guardbanding strategies to mitigate false-accept-risk. Guardbanding techniques

    often require discretionary adjustments to be made where low Test Uncertainty Ratio (TUR)and/or End Of Period Reliability (EOPR) encountered. Significant literature exists on this subject

    [23-23].

    mailto:[email protected]:[email protected]:[email protected]:[email protected]
  • 7/25/2019 Instrument Adjustement Policies

    2/44

    2015 NCSL International Workshop & Symposium

    However, this paper endeavors to provide an investigation into discretionary adjustments of in-

    tolerance instruments which are made, not to mitigate false accept risk, but as a preemptive

    measure in an attempt to reduce the potential for future out-of-tolerance (OOT) conditions. Areduction in OOT probability can translate into improved EOPR reliability. Such adjustments are

    often made on the bench at the discretion of the calibration technician when the observed error

    is deemed too closeto the tolerance limits. Organizations sometimes may have a blanket policyor threshold in place that defines, in a broad general sense, what too closeis. This adjustmentthreshold may be 70 % of specification, 80 % of specification, or any other arbitrary value. The

    intent of this policy may be to improve accuracy and mitigate future OOT conditions, improving

    EOPR. The objective of this paper is to investigate whether such adjustments can, in fact,provide an increase in accuracy and a reduction in OOT probability (increased EOPR) and, if so,

    by how much and under what conditions. The possibility of calibration adjustments unwittingly

    degrading performance is also investigated.

    There are no national or international standards which dictate or require adjustment during

    calibration, unless an instrument is found OOT or the observed error fails to meet guardband

    criteria. ANSI/NCSL Z540.3-2006 and ISO/IEC-17025:2005 do not mandate discretionaryadjustment of in-tolerance items [1 - 3]. The International Vocabulary of Metrology (VIM)

    clearly defines calibration, verification, and adjustment as separate actions [4]. Adjustment is not

    a defacto aspect of calibration. As defined by the VIM:

    Calibration: Operation that, under specified conditions, in a first step, establishes a relationbetween the quantity values with measurement uncertainties provided by measurement standardsand corresponding indications with associated measurement uncertainties and, in a second step,uses this information to establish a relation for obtaining a measurement result from anindication NOTE 2: Calibration should not be confused with adjustment of a measuringsystem, often mistakenly called self-calibration, nor with verification of calibration.

    Adjustment of a measuring system: Set of operations carried out on a measuring system so thatit provides prescribed indications corresponding to given values of a quantity to be measuredNOTE 2: Adjustment of a measuring system should not be confused with calibration, which is aprerequisite for adjustment.

    Verification:Provision of objective evidence that a given item fulfils specified requirementsEXAMPLE 2: Confirmation that performance properties or legal requirements of a measuringsystem are achievedNOTE 3: The specified requirements may be, e.g. that a manufacturer'sspecifications are met NOTE 5: Verification should not be confused with calibration.

    Despite these established definitions, there have been recent accounts where entities regulated by

    the Food and Drug Administration (FDA) have receivedForm-483Investigational Observations

    and Warning Letters arising from the failure to alwaysadjust in-tolerance instruments (i.e. allinstruments) during calibration [5]. These incidents may be attributable to a nebulous distinction

    between the definitions of calibration, verification, and adjustment. References to similar events

    in regulated industries have also been published [6 - 8] where calibrationrequirements havebeen inferred to mandate adjustment during calibration.

  • 7/25/2019 Instrument Adjustement Policies

    3/44

    2015 NCSL International Workshop & Symposium

    Consistent with the VIM definitions, a calibration, where a pass/fail conformance decision is

    made, also satisfies the definition of a verification. However, the converse is not true; not all

    verifications are calibrations. This distinction is important because, for example, not allcalibrations result in a pass/fail conformance decision being issued. Such is the case for most

    calibrations performed by National Metrology Institutes (NMI) and some reference standards

    laboratories where calibrationsare routinely performed and no pass/fail conformance decision ismade. The definition of calibration requires no such conformance decision be rendered. In thesecases, calibrationconsists of the measurement data reported along with the measurement

    uncertainty. Such operations still adhere to the VIM definition of calibration, but they are not

    verifications, since no statement of conformance to metrological specifications is given.

    However calibrations,which do result in a statement of conformance (i.e. pass/fail) with respect

    to an established metrological specification, are also verifications. In such scenarios, the

    definitions of calibrationand verificationare both applicable. However, the absence ofadjustment of a measuring systemduring calibration in no way negates or disqualifies the

    proper usage of the term calibration. Many instruments do not lend themselves to adjustment

    and are not designed to be physically or electronically adjusted to periodically nominalize theirperformance for the purpose of reducing measurement errors; yet, such instruments are still quite

    capable of being calibrated. The distinction is readily apparent as indicated by ANSI/NCSL

    Z540.3-2006 section 5.3a and 5.3b shown below [1, 2].

    5.3 Calibration of Measuring and Test Equipment

    a) Where calibrations provide for reporting measured values, the measurement uncertaintyshall be acceptable to the customer and shall be documented.

    b) Where calibrations provide for verification that measurement quantities are within specifiedtolerances, the probability that incorrect acceptance decisions (false accept) will result from

    calibration tests shall not exceed 2 % and shall be documented. Where it is not practicableto estimate this probability, the test uncertainty ratio shall be equal to or greater than 4:1.

    2. NCSLI RP-1: Establishment and Adjustment of Calibration Intervals

    As stated, discretionary adjustments of in-tolerance instruments are often left to the judgment of

    the calibration technician, or governed by organizational policy. When deferred to the discretionof the technician, such adjustments are optimally based on professional evaluation by qualified

    personnel with experience and training in the metrological disciplines for which they are

    responsible. Heuristic assessment of instrument adjustment requirements, combined withempirical data and epistemological knowledge gathered over multiple calibration operations may

    provide a somewhat intuitive qualitativenotion of when adjustment might be beneficial.

    However, there is little formal quantitativeguidance on this subject. The most authoritativereference on such discretionary adjustments is found in NCSLI Recommended Practice RP-1,Establishment and Adjustment of Calibration Intervals, henceforth referred to as NCSLI RP-1

    [9]. Appendix G of NCSLI RP-1 refers to three adjustment policies as

    1) Renew-always2) Renew-if-failed3) Renew-as-needed

  • 7/25/2019 Instrument Adjustement Policies

    4/44

    2015 NCSL International Workshop & Symposium

    NCSLI RP-1 employs the term renewto convey an adjustment action. Herein, the renew-as-

    neededpolicy is synonymous with discretionary adjustment. As stated in RP-1 [9],

    At present, no inexpensive systematic tools exist for deciding on the optimal renewal policy fora given MTE. While it can be argued that one policy over another should be implemented on an

    organizational level, there is a paucity of rigorously demonstrable tests that lead to a clear-cutdecision as to what that policy should be. The implementation of reliability models, such as thedrift model, that yield information on the relative contributions of random and systematic effects,

    seems to be a step in the right direction.

    The objective of this paper is to provide some additional discourse regarding the random and

    systematic drift effects associated with some instruments and to provide insight as to the impact

    of these effects on EOPR reliability under various discretionary adjustment thresholds. As

    provided in NCSL RP-1 [9], discretionary adjustments may be influenced by one or more of thefollowing criteria, where this paper focuses specifically on questions #4, #5, #6, & #7:

    1) Does parameter adjustment disturb the equilibrium of a parameter, thereby hastening the

    occurrence of an out-of-tolerance condition?

    2) Do parameter adjustments stress functioning components, thereby shortening the life ofthe MTE?

    3) During calibration, the mechanism is established to optimize or "center-spec''parameters. The technician is there, the equipment is set up, the references are in-place.If it is desired to have parameters performing at their nominal values, is this not the best

    time to adjust?

    4) By placing parameter values as far from the tolerance limits as possible, does adjustmentto nominal extend the time required for re-calibration?

    5) Do random effects dominate parameter value changes to the extent that adjustment ismerely a futile attempt to control random fluctuations?

    6) Do systematic effects dominate parameter value changes to the extent that adjustment isbeneficial?

    7) Is parameter drift information available that would lead us to believe that not adjustingto nominal would, in certain instances, actually extend the time required for re-

    calibration?

    8) Is parameter adjustment prohibitively expensive?

    9) If adjustment to nominal is not done at every calibration, are equipment users beingshort-changed?

    10)What renewal practice is likely to be followed by calibrating personnel, irrespective ofpolicy?

    11)Which renewal policy is most consistent with a cost-effective interval analysis

    methodology?

  • 7/25/2019 Instrument Adjustement Policies

    5/44

    2015 NCSL International Workshop & Symposium

    Weiss [10] addressed the issue of calibration adjustment in some detail in 1991 in a paper

    entitled, Does Calibration Adjustment Optimize Measurement Integrity?. Weiss showed that in

    the presence of purely random errors associated with a normal probability density function,where no statistical difference in the mean value of the distributions exists from one calibration

    to the next, that calibration adjustment can degradeinstrument performance. Weiss and several

    other authors [1014, 5660] have drawn upon the popular Deming funnel experiment toillustrate how tampering with or adjusting a calibrated system in a state of statistical controlcan introduce additional unwanted variation into a process rather than reduceexisting variation

    1.

    As Weiss demonstrates, if the process exhibits purely random error represented by a normalprobability density function, the effect of this tamperingis to increase the variance

    2by a factor

    of 2. This is equivalent to increasing the standard deviation to a value of 2, or ~1.414. Ifthe specification limits were originally set to achieve 95 % confidence (1.96), then this

    increased variation from tampering results in an in-tolerance probability (EOPR) of only 83.4 %.

    This value becomes important for the interpretations of the results later in this paper in Section 6.

    Shah [11] likewise comments in 2007 stating, Calibration has nothing to do with adjustment.When a measurement system isadjusted to measure the nominal value whether it is within

    tolerance or not... Is this advisable or is it causing more harm than good?... Some adjustmentsare justified. Others are not.A calibration technician has to make an instant decision on a

    measurement taken... Making a bad decision can lead to quality problemsIt is shown that a

    stable process with its inherent natural (random) variation should be left on its own.

    Abell [13] also touched on this issue in 2003 noting that, one might be inclined to readjust

    points to the center of the specification. The temptation to optimize all points by adjusting tothe exact center between the specifications causes two problems. The first is that it might not be

    possible to adjust the instrument on a re-calibration to an optimal center value, even with an

    expensive repair. Second, a stable instrument that is unlikely to drift will be made worse byattempts to optimize its performance.

    Payne [14] in 2005 makes similar comments. There are two reasons adjustment is

    not part of the formal definition of calibration: (1) The historical calibration data onan instrument can be useful when describing the normal variation of the instrument or a

    population of substantially identical instruments... (2) a single measurement from that process

    is a random sample from the probability density function that describes it. Without other

    knowledge, there is no way to know if the sample is within the normal variation limits. Thehistory gives us that information. If the measurement is within the normal variation and not

    outside the specification limits, there is no reason to adjust it. In fact, making an adjustment

    could just as likely make it worse as it could make it better. W. Edwards Deming discusses theproblem of overadjustment in chapter 11 of Out of the Crisis.

    1In the Deming experiment, a stationary funnel is fixed a short distance directly above the center of a target and marbles are

    dropped through the funnel onto the target; the resting spot of each marble is marked. Repeated cycles of this will display restingspots in a random pattern with a natural fixed common-cause variation () around the targets center, following so-called rule

    #1 of never adjusting the position of the funnel. Alternatively, if the operator follows rule #2 and futility attempts to adjust theposition of the funnel after each drop (equal and opposite to the last observed error), the variation of the resting spots increases.

  • 7/25/2019 Instrument Adjustement Policies

    6/44

    2015 NCSL International Workshop & Symposium

    ISO/TS 16949:2009 [57], which supersedes QS9000 quality management requirements for the

    automotive industry, also refers to the phenomenon of over-adjustment in Section 8.1.2 by

    requiring, Basic statistical concepts, such as variation, control (stability), process capability

    and over-adjustment, shall be understood throughout the organization.

    The MSA Reference Manual [56] also describes over-adjustment, stating:the decision to adjust a manufacturing process is now commonly based on measurementdata. The data, or some statistic calculated from them, are compared with statistical control

    limits for the process, and if the comparison indicates that the process is out of statistical

    control, then an adjustment of some kind is made. Otherwise, the process is allowed to runwithout adjustment[However]Often manufacturing operations use a single part at the

    beginning of the day to verify that the process is targeted. If the part measured is off target, the

    process is then adjusted. Later, in some cases another part is measured and again the process

    may be adjusted. Dr. Deming referred to this type of measurement and decision-making astampering Over-adjustment of the process has added variation and will continue to do so...

    The measurement error just compounds the problem... Other examples of the funnel experiment

    are (1) Recalibration of gages based on arbitrary limitsi.e., limits not reflecting themeasurement systems variability (Rule 3). (2) Autocompensation adjusts the process based on

    the last part produced. (Rule 2).

    Nolan and Provost [58] in 1990 also provide the following, Decisions are made to adjustequipment to calibrate a measurement device etc. All these decisions must consider the

    variation in the appropriate measurements or quality characteristics of the process The aim of

    the adjustment is to bring the quality characteristic closer to the target in the future. ...there arecircumstances in which the adjustments will improve the performance of the process, and there

    are circumstances in which the adjustment will result in worse performance than if no

    adjustment is made... Continual adjustment of a stable process, that is, one whose output is

    dominated by common causes, will increase variation and usually make the performance of theprocess worse.

    Bucher, in The Quality Calibration Handbook [59], and The Metrology Handbook [60], statesWith regard to adjusting IM&TE, there are several schools of thought on the issue. On one end

    of the spectrum, some (particularly government regulatory agencies) require that an instrument

    be adjusted at every calibration, whether or not it is actually required. At the other end of thespectrum, some hold that any adjustment is tampering with the natural system (from Deming)

    and what should be done is simply to record the values and make corrections to measurements.

    An intermediate position is to adjust the instrument only if (a) the measurement is outside the

    specification limits, (b) the measurement is inside but near the specifications limits, where nearis defined by the uncertainty of the calibration standards, or (c) a documented history of the

    values of the measured parameter shows that the measurement trend is likely to take it out of

    specification before the next calibration due date.

    The Weiss and Deming model [10] assume purely random variation for which adjustment is not

    only futile, but actually detrimental. In such cases, adjustment or tampering results in an increase

    to the standard deviation () of the process by a factor of 1.414, or about 41 %. However, if thebehavior is not purely random, the results can differ. As noted in NCSL RP-1 Appendix G [9],

  • 7/25/2019 Instrument Adjustement Policies

    7/44

    2015 NCSL International Workshop & Symposium

    However, if a systematic mean value change mechanism, such as monotonic drift, is introduced

    into the model, the result can be quite different. For discussion purposes, modifications of the

    model that provide for systematic change mechanisms will be referred to as Weiss-Castrupmodels (unpublished) By experimenting with different combinations of values for drift rate

    and extent of attribute fluctuations in a Weiss-Castrup model, it becomes apparent that thedecision to adjust or not adjust depends on whether changes in attribute values arepredominately random or systematic.

    Appendix D of NCSL RP-1 describes ten Measurement Reliability Models with #9 beingsystematic attribute drift superimposed over random fluctuations (drift model)[9]:

    1) Constant out-of-tolerance rate (exponential model).

    2) Constant-operating-period out-of-tolerance rate with a superimposed burn-in or wear out

    period (Weibull model).

    3) System out-of-tolerances resulting from the failure of one or more components, each

    characterized by a constant failure rate (mixed exponential model).4) Out-of-tolerances due to random fluctuations in the MTE attribute (random walk model).

    5) Out-of-tolerances due to random attribute fluctuations confined to a restricted domainaround the nominal or design value of the attribute (restricted random-walk model).

    6) Out-of-tolerances resulting from an accumulation of stresses occurring at a constant

    average rate (modified gamma model).

    7) Monotonically increasing or decreasing out-of-tolerance rate (mortality drift model).

    8) Out-of-tolerances occurring after a specific interval (warranty model).

    9) Systematic attribute drift superimposed over random fluctuations (drift model).

    10) Out-of-tolerances occurring on a logarithmic time scale (lognormal model).

    This paper investigates behavioral characteristics of instruments that are described by the #9

    reliability model above, systematic attribute drift superimposed over random fluctuations (driftmodel).

    Background information provided in Appendix D of NCSLI RP-1 is highly enlightening with

    respect to the Weiss-Castrup Drift model and the decision to adjust or not. Additional

    information is also provided by Castrup [54].

    A section from Appendix D of NCSLI RP-1 is provided here to facilitate an understanding of the

    relationship between systematic and random components of behavior and their influence on both

    intervaland instrumentadjustment decisions, where denotes the normal distribution function:

    12

    Where:

    random variable, standard deviation, mean

  • 7/25/2019 Instrument Adjustement Policies

    8/44

    2015 NCSL International Workshop & Symposium

    Appendix D of NCSL RP-1 [9]

    Drift Model: (,) ( 3) ( 3)

    12 (

    +

    )

    /

    12

    ( )/

    3

    2

    (+ )/ ( )/

    Figure D-11. Drift Measurement Reliability Model

    2.5, 0.5,and 3 0.5Renewal Policy and the Drift Model:

    In the drift model, if the conditions |(3 )| and |(3 )|hold, then the measurement reliabilityof the attribute of interest is not sensitive to time elapsed since calibration. This is equivalent to saying

    that, if the coefficient 3is small enough, the attribute can essentially be left alone, i.e., not periodicallyadjusted.

    Interestingly, the coefficient 3is the rate of attribute value drift divided by the attribute value standarddeviation: 3 /, where attribute drift rate, and attribute standard deviation. From thisexpression, we see that the coefficient 3is the ratio of the systematic and random components of themechanism by which attribute values vary with time. If the systematic component dominates then 3willbe large. If, on the other hand, if the random component dominates, then 3will be small. Putting thisobservation together with the foregoing remarks concerning attribute adjustment leads to the followingaxiom:

    If random fluctuation is the dominating mechanism for attribute value changes over time, then thebenefit of periodic adjustment is minimal.

    As a corollary, it might also be stated that

    If drift or other systematic change is the dominating mechanism for attribute value changes overtime, then the benefit of periodic adjustment is high.

    Obviously, use of the drift model can assist in determining which adjustment practice to employ for agiven attribute. By fitting the drift model to an observed out-of-tolerance time series and evaluating the

    coefficient 3it can be determined whether the dominant mechanism for attribute value change issystematic or random. If 3is small, then random changes dominate and a renew-if-failedonly practiceshould be considered. If 3is large, then a renew-alwayspractice should perhaps be implemented.

    Copyright2010 NCSLI. All Rights Reserved. NCSLI Information Manual. Reprinted here under the provisions of the

    Permission to Reproduce clause of NCSLI RP-1.

  • 7/25/2019 Instrument Adjustement Policies

    9/44

    2015 NCSL International Workshop & Symposium

    The Weiss-Castrup Drift model described in NCSL RP-1 was primarily intended for the

    determination, adjustment, and optimization of calibration intervals in association with Methods

    S2 & S3, also called the Binomial Method and the Renewal Time Method, respectively [9].The Weiss-Castrup drift model is investigated here with a focus on instrumentadjustment

    thresholds, rather than intervaladjustment actions. That is, for a givenfixedcalibration interval,

    how do various discretionary adjustment thresholds (0 % to 100 % of specification), in thepresence of both drift and random variation, affect EOPR reliability? Clearly, if the behavior ispurely random, as in the Weiss and Deming models, an adjust-alwayspolicy (0 % adjust

    threshold) is detrimental to the instrument performance resulting in decreasedEOPR.

    However, if the behavior has any element of monotonic drift, as in the Weiss-Castrup Drift

    model, an adjustment will be necessary at some point to prevent an eventual OOT condition

    resulting from a true attribute biasdue to drift. The difficulty manifests during calibration when

    attempting to discriminate between attributebiasand a randomerror. Thus, investigating optimaladjustment thresholds to maximize EOPR in the presence of random and systematic errors seems

    a worthy endeavor. It is also prudent to consider that, even if an optimum adjustment threshold

    is determined, there may be other administrative and managerial factors as described in NCSLRP-1 Appendix G [9] that should be considered when formulating adjustment policies.

    The policy of some U.S. Department of Defense military programs and third party OEM

    accredited calibration laboratories has been to not routinely, by default, adjust most equipmentunless found out-of-tolerance. For example, The U.S. Navy has the policy of not adjusting test

    equipment that are in tolerance. [15].

    However, even under some programs which typically employ an adjust-only-if-OOTpolicy,

    discretionary adjustments are still performed for select equipment types. For example, it is not

    uncommon to alwaysassign new calibration factors to microwave power sensors, or sensitivity

    values to accelerometers, or coefficients to temperature sensors (e.g. RTS, PRTs, etc.),regardless of the as-found condition of the device. In these cases, rather than judge in-tolerance

    or out-of-tolerancebased on published specifications, these decisions are often rendered based

    on the previously assigned uncertainty, applicable to the assigned value. In these applications,uncertainties must include a reproducibility component in the uncertainty budget that is

    applicable over the calibration interval for stated conditions. Such estimates can be attained by

    evaluation of historical performance.

    3. Empirical Examples: Systematic Drift Superimposed Over Random Fluctuations

    The idea that attribute bias can grow or drift over time is ubiquitous; indeed much of the history

    of metrology and the impetus for calibration are predicated on this possibility. Examples of such

    behavior are often encountered. The distinction between attributebiasarising from drift (orotherwise), and a randomerror, is sometimes only discernable from the analysis of historical

    data. Monotonic drift can be estimated using linear regression models. Such is the case with 10 V

    DC zener voltage references. Calibration of these devices must be performed via comparison toother characterized zeners, standard cells, or, in the most accurate cases, Josephson voltage

    measurement systems. Due to the inherently low drift characteristics of commercial zener

    references, it would not be possible to adequately detect or resolve drift without a measurementsystem exhibiting high resolution, low noise, and zero (or well-characterized/compensated) drift.

  • 7/25/2019 Instrument Adjustement Policies

    10/44

    2015 NCSL International Workshop & Symposium

    The data represented in Figure 1 was acquired with a Josephson voltage measurement system.

    The noiseor variation observed in the data is primarily due to the zener under test and not tothe measurement standard, while allof the observed drift is attributable to the zener and none to

    the measurement standard [16, 17]. It may be noted that the fluctuations about the predicted drift

    line are notpurely randomin nature; they are pseudo-random.

    Figure 1. Zener drift and pseudo-random variation

    Short term variation is also significantly lower than long term variation about the predicted line

    (better repeatability than reproducibility). Long term variation is attributable to uncorrectedseasonal pressure and/or humidity dependencies, 1/f noise, white noise, etc. In the presence ofthis long-term variation, significant calibration history is necessary in order to confidently

    characterize the drift of such instruments. Moreover, some of the apparently random common-

    cause variation might indeed be correctable. One example is by application of pressure

    coefficients to correct for ambient changes in barometric pressure. In many applications, withenough effort and the availability of measurement systems with ultra-high resolution and

    accuracy, some apparently common-cause variation can be revealed as special-cause. All

    metrology systems, to include the UUT, will ultimately contain a finite amount of common-cause variation or uncertainty, even after all corrections have been applied.

    The R2

    value (or coefficient of determination) from the regression is a figure of merit for thelinear drift model and other models, as it compares the amount of variation around the predictionto the variation resulting from a constant (no-drift) model. R

    2is an indicator of the amount of

    variation that is explained by linear monotonic drift. Normality tests and visual analysis of the

    regression residuals is also beneficial and can reveal secondary non-linear effects.

  • 7/25/2019 Instrument Adjustement Policies

    11/44

    2015 NCSL International Workshop & Symposium

    It is interesting to visually ponder an attempt to characterize zener drift in Figure 1over a

    relatively short period of time. Certain instances of data analysis over such time periods might

    produce significantly different predictions of drift. This is evident, even via visual examination,by observing only data from Jan 2001 to Jan 2002 which would result in apositivedrift slope.

    This illustrates the benefit of long calibration histories when attempting to predict drift in the

    presence of random or pseudo-random variation, especially where the periodicity of thesevariations is long.

    However, a subjective decision must be made when determining how muchhistorical data to

    include in the regression. At some point, it may be reasonable to conclude that future behavior,especially in the short term, is not significantly dependent on data from 10+ years ago. In

    general, short-term predictions are better made by assessment of more recent history only, while

    long-term predictions might be more accurate using the full comprehensive history. Special

    cause variation, such as the loss of power, can justify excluding data previous to the event. Thisis a subjective process and heuristic judgment based on experience and knowledge of zener

    behavior is helpful in determining how much data to include in the regression.

    Zener references are not typically declared in-tolerance or out-of-tolerance by assessment against

    a published accuracy specification, but rather to their predicted value and its assigned uncertainty

    at a given time during the calibration interval. Zener references are also not typically adjusted,

    although provision for electrical adjustment does exist. In lieu of physical/electrical adjustment,the assigned/predicted value is mathematically adjusted or reassignedover time during

    calibration. Algebraic corrections never interfere with the stability of a device nor are they

    limited by the resolution of the adjusting mechanism. They do require the manual use of chartedvalues and uncertainties via reference to a Report of Test or calibration certificate.

    By sheer numbers, the majority of items calibrated throughout the world are not predominately

    high-level reference standards, but are of the more general variety of Test, Measurement, andDiagnostic Equipment (TM&DE). Often times, the calibration history of such TM&DE contains

    adjustment actions of both in-tolerance and out-of-tolerance instruments. The data shown in

    Figure 2 represents actual data from the 50 V DC test point of a 4 digit handheld multimeter(UUT). On the third calibration event, the UUT was found out-of-tolerance and was adjusted

    back to nominal, resulting in zero observed error.

    It is visually intuitive that this particular test point displays a high degree of monotonic drift with

    very little random variation. In order to perform regression analysis, the magnitude and direction

    of the adjustment action must be mathematically removed from the raw calibration data. The

    resulting regression analysis is shown in Figure 3.

  • 7/25/2019 Instrument Adjustement Policies

    12/44

    2015 NCSL International Workshop & Symposium

    Figure 2. Calibration history representing as-found and as-left data

    Figure 3. Regression of calibration data with adjustments mathematically removed (R2= 0.96)

    However, in many cases of general purpose TMDE, detection of monotonic drift may be moredifficult to resolve due to domination by more random behavior or even special-cause variation

    where instruments apparently step out-of-tolerance, rather than drift in a predictable manner.

    Such an example, along with the regression analysis is shown in Figures 4 and 5. In such cases, amodel with random fluctuation superimposed on monotonic drift may not be the best model. One

    of the other models proposed in NCSLI RP-1 may be more appropriate for such an instrument.

  • 7/25/2019 Instrument Adjustement Policies

    13/44

    2015 NCSL International Workshop & Symposium

    Figure 4. Calibration history representing predominately non-monotonic drift behavior

    Figure 5. Regression w/ relatively low R2, indicating significant behavior not explained by drift

  • 7/25/2019 Instrument Adjustement Policies

    14/44

    2015 NCSL International Workshop & Symposium

    4. Assumptions of the Drift Model

    The Weiss-Castrup Drift model is investigated in this paper with the following assumptions:

    1) The only two change mechanisms for instrument error are (a) linear monotonic drift and(b) normally distributed random errors. No spontaneous special-cause step transitions or

    other variation/ behavior is accommodated.

    2) The periodicity and magnitude of the random fluctuations during measurement(repeatability) is negligibly small compared to the periodicity and magnitude of the

    random fluctuations over the calibration interval (reproducibility). Here, a single

    measurement is simulated.

    3) Tolerance specifications for the UUT are intended to represent approximately 95 %containment probability. Drift from 0 % to 100 % of specification is modeled as attribute

    bias. The normally distributed random component selected as the remainder of the

    specification, less the allotted drift bias, where yields 95 % containment probability

    for the remaining random error. The higher the allotted drift (), the lower variation ().

    4) The drift is constrained between 0 % and ~100 % of the stated specification.

    5) The measurement uncertainty at the time of calibration is negligibly small, i.e., high TestUncertainty Ratio (TUR) or, equivalently, Measurement Capability Index (Cm).

    Laboratory standards do not contribute significantly to the measurement uncertainty.

    6) High precision physical and/or electrical adjustment provisions for the UUT are provided

    which are capable of rendering an observed error of zero (eOBS= 0) after adjustment. Thismay be a poor assumption for multi-range, multifunction instruments with many test

    points. Algebraic (manually applied mathematical) corrections are equivalent.

    7) Physical or electrical adjustments do not induce any secondary instabilities or otherwisedisturb the equilibrium or stress components of the instrument. No interaction betweenadjustment controls for various test points, ranges, or functions is assumed.

    8) Observed Out-Of-Tolerance conditions (>100 % of specification) require mandatoryadjustment. The adjustment threshold is constrained between 0 % to 100 % of

    specification. However, adjustment thresholds >100 % are briefly investigated.

    9) An adjustment action will always negate any previous attribute bias present at the end ofthe previous period, but will also (insidiously) result in a present attribute biasequal to

    the negative of the previous randomerror. No quantitative a-prioridrift information is

    assumed at the time of adjustment. Adjustment will overcompensate by the amount ofprevious random error, as in the Deming funnel rule #2.

    10)The adjustment threshold is always adhered to. If eOBS> adjustment threshold, anadjustment will always be performed. If eOBS< adjustment threshold, no adjustment isperformed. Human behavioral/procedural error in adhering to the adjustment threshold isnot accommodated.

    11)Symmetry is assumed and only positive drift is simulated with equal implications andconclusions applicable to negative drift.

    Assumptions #2, #3, and #9 above require further comment.

  • 7/25/2019 Instrument Adjustement Policies

    15/44

    2015 NCSL International Workshop & Symposium

    4.1 Assumption (#2): Periodicity and Magnitude of Variation

    The Weiss paper and the Deming funnel addressed the periodicity by restricting adjustment

    decisions to a single reading or observation. In the Weiss example, asinglemeter reading and

    adjustment was performed every hour. Rather than decrease any attribute bias, the adjustmentsresulted in increased random variation. Weiss concludes, The presence and size of the bias

    cannot be determined by a single reading; multiple data points are required One mustobserve enough data to characterize the variability of the meter readings to know which is the

    correct strategy [adjust or not].

    Likewise, the model herein assumes that a single measurement is made during calibration or, if

    repeated measurements are made and averaged, that the variability during calibration isnegligible with respect to the larger variation that occurs over the calibration interval. That is,

    the random fluctuations occurring during the relatively short observation period of calibration

    (repeatability) are not representative of, or do not capture, the full extent of the variation

    exhibited over the longer calibration interval (reproducibility). This is somewhat akin to the longterm dependency of 1/fnoise. On the contrary, if the periodicity and magnitude of fluctuations

    are similar, then random fluctuations during calibration interval arerepresented by thoseencountered during the shorter measurement process. Such variations can then be largely negatedwith averaging techniques during the measurement process, which should then be capable of

    discerning actual attribute biasin the presence of random fluctuations. Under these

    circumstances, adjustment could be warranted resulting in a genuine improvement in accuracy.

    Figure 6. Measurement variation during calibration, compared with variation over cal interval.

  • 7/25/2019 Instrument Adjustement Policies

    16/44

    2015 NCSL International Workshop & Symposium

    Like the Weiss example and Deming funnel (rule #2), this model presented in this paper will

    incorrectly assume the observed UUT error of +60 % shown in Figure 6 is attributebias, even

    under purely random behavior. Such an erroneous assumption will result in a calibrationadjustment magnitude of -60 % in a futile effort to correct for the observed random error. Like

    Weiss and Deming, the correct assumption under purely random behavior is that the +60 % error

    is common-cause and, if left undisturbed, will soon fluctuate and take on some other randomerror represented by the UUT distribution. If this assumption is valid, the correct action would beto do nothing and notadjust. The model presented here attempts to replicate the actions of the

    calibrating technician, whom does not have knowledge of the magnitudes of the individual

    systematic attribute biasand vs. random behavior; adjustments are made only on the observederrorat the time of calibration, which is comprised of both bias from drift and random error.

    But this decision can only confidently be made if a-prioriknowledge of the UUT error

    distribution over the course of the calibration intervalis known. In many cases, this distributionis not readily available and discretionary calibration adjustments are made with the assumption

    that all of the observed error is an actual attribute biaswhich will remain (or possibly grow)

    unless an adjustment is performed. In an ideal case, the calibration technician would be able todiscern a short-term randomerror from an actual long-term attribute biasthrough examination

    of historical data. At the time of calibration however, the two types of errors are often

    inextricably combined into the observed error, whether obtained from a single reading or

    several averaged measurements over a short period of time. The attribute bias is somewhathidden in the presence of random error. This is the behavior that is modeled herein.

    4.2 Assumption (#3): 95 % Containment Specifications; Selection of Drift vs. Random

    This is perhaps the most significant and sweeping assumption used in the model presented here.The rationale used herein assumes that specifications are generally intended to adequately

    accommodate or contain the majority of errors that an instrument might exhibit, with relatively

    high confidence (e.g. 95 %). As such, the magnitudes of drift and random variability are selectedas complementary to one another and modeled under this assumption. This greatly restricts the

    domain of possible instrument behavior investigated here. Instruments with drift and random

    variation, which are bothfar better (lower) than their specifications might imply, are not modeled

    here. Rationale for the assumption and selection of the particular domain of instrument behaviorinvestigated in this paper is provided here.

    As stated in Section 5.4 of NASA HDBK-8739.19-2 [18], In general, manufacturerspecifications are intended to convey tolerance limits that are expected to contain a given

    performance parameter or attribute with some level of confidence under baseline conditions

    Performance parameters and attributes such as nonlinearity, repeatability, hysteresis,resolution, noise, thermal stability and zero shift are considered to be random variables that

    follow probability distributions that relate the frequency of occurrence of values to the values

    themselves. Therefore, the establishment of tolerance limits should be tied directly to theprobability that a performance parameter or attribute will lie within these limits

    The selection of applicable probability distributions depends on the individual performance

    parameter or attribute and are often determined from test data obtained for a sample of articles

  • 7/25/2019 Instrument Adjustement Policies

    17/44

    2015 NCSL International Workshop & Symposium

    or items selected from the production population. The sample statistics are used to infer

    information about the underlying parameter population distribution for the produced items. This

    population distribution represents the item to item variation of the given parameter. The

    performance parameter or attribute of an individual item may vary from the population mean.However, the majority of the produced items should have parameter mean values that are very

    close to the population mean. Accordingly, a central tendency exists that can be described by thenormal distribution

    Baseline performance specifications are often established from data obtained from the testing of

    a sample of items selected from the production population. Since the test results are applied tothe entire population of produced items, the tolerance limits should be established to ensure that

    a large percentage of the items within the population will perform as specified performance

    parameter distributions are established by testing a selected sample of the production

    population. Since the test results are applied to the entire population of a given parameter, limitsare developed to ensure that a large percentage of the population will perform as specified.

    Consequently, the parameter specifications are confidence limits with associated confidence

    levels.

    Accuracy specifications are of little benefit if they cannot be relied upon with reasonably high

    confidence. Manufacturers sometimes publish specifications at both 95 % and 99 % confidence

    levels [Ref 19]. After many calibration cycles, EOPR is then an empirical estimate of thatconfidence; i.e. EOPR provides a measure or assessment of the probability for an instrument to

    comply with its specifications at the end of its calibration interval.

    However, the intentand conditionsof specifications and any assumed confidence are subject to a

    certain amount of interpretation and inference. Is the confidence level specifically stated or is it

    implied? Does the confidence level of the specification apply to asingle test point, or to asingle

    instrument, or to apopulation of similar instruments?

    For example, the published absolute uncertainty specification at a 95 % confidence level for a

    Fluke 8508A DMM, at 20 VDC, is 3.2 ppm [Ref 19]. The same 20 VDC point has a publisheduncertainty of 4.25 ppm expressed at a 99 % confidence level. As manufactured and if properly

    used, it might be reasonable for the end-user of this DMM to apply the stated specification at this

    particular 20 VDC test pointand assume the stated confidence level applies.

    However, it can be argued that for multifunction instruments with multiple test points, the actual

    confidence level of any individual test point must be much greater than 95 % or even 99 %

    confidence if the instrument as-a-wholeis expected to meet its specifications with the stated

    confidence.

    As Deaver has noted [20]each Fluke Model 5520A Multiproduct Calibrator is tested at 552

    points on the production line prior to shipment. If each of the points has a 95% probability ofbeing found in tolerance, there would only be a 0.95

    552= 0.000000000[0]51% chance of finding

    all the points within the specification limits if the points are independent! Even if we estimate

    100 independent points (about 2 per range for each function), we would still have only a 0.95100

    = 0.6% chance of being able to ship the product.

  • 7/25/2019 Instrument Adjustement Policies

    18/44

    2015 NCSL International Workshop & Symposium

    Similar statements have been published by Dobbert [Ref 21, 21a]. A common assumption is

    that product specifications describe 95% of the population of product items [emphasis added].

    From the mean, , and standard deviation, , an interval of [ - 2, + 2] contains

    approximately 95% of the population. However, when manufacturers set product specifications,the test line limit is often set wider than 2from the population mean...

    For choosing the tolerance interval probability, a generally accepted minimum value is 95%.However, manufacturers may choose a probability other than 95% for different reasons.

    Consider again a multi-parameter product. Manufacturers wish to have high yields for the entire

    product so that the yield considering all parameters meets the respective test line limits. If theproduct parameters are statistically independent, the overall yield, in this case, is the product of

    the probability for each parameter. For a product with just three independent parameters, each

    with a test limit intended to give 95% probability, the product would only have a (0.95%)3

    or

    85.7 % chance of meeting all test line limits, which is perhaps unacceptable to the manufacturer.For this reason, manufacturers select tolerance interval probabilities greater than 95% so that

    the overall probability is acceptable.

    When discussing drift, Dobbert also notes, Stress due to environmental change, as well as

    everyday use, transport, aging and other factors may induce small changes in performance that

    accumulate over time. In other words, products drift. The effect of drift is that from the time of

    manufacture to the end of the initial calibration interval, it is likely that performance has shifted.a population of product items also experiences a shift in the mean, a change in the standard

    deviation, or both, due to the mechanisms associated with drift

    To ensure products meet specification over the initial calibration interval, manufacturers may

    include an additional guard band between the test line limit and the specificationIn the

    simplest case, the total guard band between the test line limit and the specifications is the sum of

    the individual guard band components for environmental factors, drift, measurement uncertainty

    and any other required component. For example, gives what is often the initial specification for a product. For the final specification,manufacturers must consider manufacturing costs, market demands and competing product

    performance.

    When discussing manufacturers specificationspropagated into uncertainty analyses, Dobbertadditionally notes, The GUM provides guidance for evaluation of standard uncertainty and

    specifically includes manufacturers specifications as a source of information for Type-B

    estimate To evaluate a Type-B uncertainty, the GUM gives specific advice when an

    uncertainty is quoted at a given level of confidence. In this instance, an assumption can be made

    that a Gaussian distribution was used to determine the quoted uncertainty. The standarduncertainty can then be determined by dividing by the appropriate factor given the stated level of

    confidence. Various manufacturers state a level of confidence for product specifications and

    applying this GUM advice to product specifications quoted at a level of confidence is commonand accepted by various accreditation bodies.

  • 7/25/2019 Instrument Adjustement Policies

    19/44

    2015 NCSL International Workshop & Symposium

    The assumption, used in the model investigated by this paper, is that a specification represents

    95 % containment probability of errors for a given test point; thus, the magnitude and proportion

    of drift and random components are modeled accordingly (see Section 5). This may be asignificant assumption and highly conservative, especially where actual instrument performance

    at a given test point exhibits systematic drift (bias) and random error components much lower

    than represented by the specifications. For example, the domain of performance for instrumentsdisplaying drift () of only 10 % of specification per interval and, at the same time, a randomcomponent () of only 20 % of specification is not modeled here.

    However, a great many instruments may be well capable of performing at such levels, i.e.considerably better than their specifications would imply. This is especially true if one assumes

    that the manufacturer has built significant margins or guardbands into the specifications and/or

    that the confidence level of specifications is intended to represent an entirepopulationof

    instruments, or one instrument as-a-whole, rather than a single test point. Investigations of suchdomains of behavior, and the effect on EOPR of various adjustment thresholds under such

    improved instrument performance, may be highly insightful and are deferred to future

    explorations

    2

    . Moreover, models where random variation () itself increases with time (such asrandom-walk models) would be useful, with or without a drift component. Such a model, even in

    the absence of monotonic drift, exhibits a time-dependent mechanism for transitioning to OOT3.

    4.3 Assumption (#9): Mandatory Adjustment of OOT Conditions is Required

    In practice, calibration laboratories, which are charged with verificationas part of the calibration

    process, are required to perform an adjustment if the UUT if it exceeds the allowable tolerance(s)

    (>100 %) defined by the agreed-upon specifications. It is not generally acceptable to return an

    item to the end-user as calibrated, while exhibiting an observed OOT condition.

    However, in a Weiss or Deming model where fluctuations are purely random, this would appear

    the correct course of action. The OOT condition, like the in-tolerance condition, should not beadjusted; it should be allowed to remain with the assumption that it will soon decrease and take-

    on some other random value which will likely be contained within the specification limits. In this

    regard, there is nothing special about the OOT condition. It is simply part of the normal

    common-cause random variation that will inevitably, albeit rather infrequently (e.g. 5 %), falloutside of specification limits which are intended to represent 95 % confidence or other

    containment probability. Appendix G of NCSLI RP-1 perhaps best describes this as a logical

    predicamentwhen discussing non-adjustment of items as follows:

    If we can convince ourselves that adjustment of in-tolerance attributes should not be made, how

    then to convince ourselves that adjustment of out-of-tolerance attributes is somehow beneficial?

    For instance, if we conclude that attribute fluctuations are random, what is the point of adjustingattributes at all? What is special about attribute values that cross over a completely arbitrary

    line called a tolerance limit? Does traversing this line transform them into variables that can be

    controlled systematically? Obviously not.

    More on the topic of non-adjustment of OOT conditions is presented later in Section 6.

    2The author thanks Jonathan Harben of Keysight Technologies for these astute suggestions.3The author thanks Dr. Howard Castrup of Integrated Sciences Group for this valuable observation.

  • 7/25/2019 Instrument Adjustement Policies

    20/44

    2015 NCSL International Workshop & Symposium

    The model presented herein concedes to the conventional industry practice which mandates

    adjustment of items which are observed to be out-of-tolerance. Where the observed error is

    predominately a long-term attribute bias, resulting from systematic monotonic drift orotherwise, adjustment is a beneficial action. Such attribute bias is likely to remain or possibly

    grow larger if left unadjusted. However, where the observed error resulted predominately from a

    short-term random event, adjustment will be the incorrect decision. Like the calibrationtechnician, this model assumes (correctly or incorrectly) that all observed as-received errorsrepresent systematic attribute bias; adjustment actions will be implemented according to the

    adjustment threshold parameter set for the model (0 % to 100 % of specification). In this sense,

    the model feigns ignorance of the constituent proportion of randomto attribute biasduringadjustment actions but, in actuality, is privy to the amount of attribute bias at all times in the

    simulation. For investigational purposes, adjustment thresholds >100 % of specification are

    briefly discussed, although they are believed unlikely to find application in most calibration

    laboratories.

    5. Modeling and Selection of Magnitude for Drift and Random Variation.

    The illustration in Figure 7 represents the general concept of monotonic drift superimposed onconstant random variation.

    Figure 7. Monotonic drift superimposed on constant random variation

    LEFT: The random variation has been superimposed on no drift at all (0 %) and thespecification adequately contains the 95 % of the random errors.

    CENTER: The random variation has been superimposed on drift in the amount of 50 % ofspecification. The meanof this distribution at the end of the calibration interval is not zero,

    but is equal to the amount of drift accumulated over the calibration interval (50 %). Thus, asignificant portion (16.4 %) of errors will exceed the upper specification limit and only 0.2 %

    will exceed the lower specification limit. Only 83.5 % will be in-tolerance (EOPR).

    RIGHT: The random variation is superimposed on drift in the amount of 100 % ofspecification. The meanof the distribution is shifted to 100 % of specification, resulting in50 % of the errors exceeding the upper specification limit when received for calibration. This

    is generally an unacceptable situation, as End Of Period Reliability of 50 % is below most

    industry accepted reliability targets. See Section 7 for examples of EOPR objectives.

  • 7/25/2019 Instrument Adjustement Policies

    21/44

    2015 NCSL International Workshop & Symposium

    Figure 7 represented random variation as a normal probability distribution with constant width(

    = constant). However, if the specification limits are intended to provide a containment

    probability of 95 % as discussed in Section 4.2, then any allowable drift must result in acommensurate reduction in the amount of allowable random variation in order to still provide a

    95 % confidence. In the model used herein, the amount of drift is first selected as a percentage

    (0 % to 100 %) of the allowable specification over one interval. This will result in a systematicdrift-induced attribute biasat the end of one interval equal to the amount of specified drift. OOTincidents will tend towards the direction of drift; e.g. for apositivedrift allowance, OOT

    conditions will predominately be found exceeding the upperspecification limit in only one tail of

    the distribution. The resulting drift, after one interval, forms the mean () of the normallydistributed random component.

    Since the intent of the accuracy specification is assumed to represent a 95 % containment

    probability for the error, the remaining portion of the specification is then modeled as a normallydistributed random component with a standard deviation of () selected to still provide 95 %

    containment (see Table 1 and Figure 8). This complementary aspect of these two components is

    necessary to provide the desired containment probability. As discussed in Section 4.2,specifications are often, directly or implied, provided by the OEM with an allowance for drift

    designed into them and provided at a relatively high confidence level. This is the basis for the

    choice of magnitudes for the model used here. As the drift component dominates and approaches

    the 100 % specification limit, the random component approaches zero. That is, as the systematicdrift () increases, the random variation () decreases, as shown in Figure 8.

    Figure 8. Positive drift superimposed on complementary random variation

    To maintain 95 % EOPR, a perfect adjustment would need to be made at the end of eachcalibration interval (in-tolerance or not). This is necessary to reduce the attribute bias (due to

    drift or otherwise) to zero. Only if this ideal adjustment alwaysoccurs at each the end of each

    calibration interval would 95 % EOPR be achievable in this model. However, such adjustmentwill not be possible in this model, due to the nature of the random variation precluding an ideal

    adjustment. Thus, EOPR will be less than 95 % for adjustment thresholds between 0 % and 100

    % of specification.

  • 7/25/2019 Instrument Adjustement Policies

    22/44

    2015 NCSL International Workshop & Symposium

    Table 1. Magnitude of drift () and random () components, modeled to maintain 95 % in-tolerance confidence.

    () Given as percentage of the specification.() Given as a percentage of the specification per interval.

    Mean S.D. Ratio Left Tail Right Tail Mean S.D. Ratio Left Tail Right Tail Mean S.D. Ratio Mean S.D. Ratio

    () () ( / ) OOT OOT () () ( / ) OOT OOT () () ( / ) () () ( / )

    Drift Random Prob. Prob. Drift Random Prob. Prob. Drift Random Drift Random

    0 % 51.021 % 0.000 2.500 % 2.500 % 26 % 44.386 % 0.586 0.226 % 4.774 % 51 % 29.790 % 1.71 76 % 14.591 % 5.21

    1 % 51.011 % 0.020 2.385 % 2.614 % 27 % 43.881 % 0.615 0.190 % 4.810 % 52 % 29.181 % 1.78 77 % 13.983 % 5.51

    2 % 50.981 % 0.039 2.271 % 2.729 % 28 %43.363 %

    0.6460.158 % 4.842 %

    53 % 28.573 % 1.85 78 % 13.375 % 5.83

    3 % 50.933 % 0.059 2.158 % 2.843 % 29 % 42.834 % 0.677 0.130 % 4.870 % 54 % 27.966 % 1.93 79 % 12.767 % 6.19

    4 % 50.864 % 0.079 2.044 % 2.956 % 30 % 42.291 % 0.709 0.106 % 4.894 % 55 % 27.358 % 2.01 80 % 12.159 % 6.58

    5 % 50.776 % 0.098 1.933 % 3.068 % 31 % 41.739 % 0.743 0.085 % 4.915 % 56 % 26.749 % 2.09 81 % 11.551 % 7.01

    6 % 50.668 % 0.118 1.822 % 3.178 % 32 % 41.177 % 0.777 0.067 % 4.933 % 57 % 26.142 % 2.18 82 % 10.943 % 7.49

    7 % 50.539 % 0.139 1.712 % 3.287 % 33 % 40.606 % 0.813 0.053 % 4.947 % 58 % 25.534 % 2.27 83 % 10.335 % 8.03

    8 % 50.392 % 0.159 1.605 % 3.395 % 34 % 40.029 % 0.849 0.041 % 4.960 % 59 % 24.926 % 2.37 84 % 9.727 % 8.64

    9 % 50.224 % 0.179 1.499 % 3.500 % 35 % 39.444 % 0.887 0.031 % 4.969 % 60 % 24.318 % 2.47 85 % 9.119 % 9.32

    10 % 50.038 % 0.200 1.396 % 3.604 % 36 % 38.856 % 0.926 0.023 % 4.977 % 61 % 23.710 % 2.57 86 % 8.511 % 10.1

    11 % 49.831 % 0.221 1.296 % 3.705 % 37 % 38.263 % 0.967 0.017 % 4.983 % 62 % 23.102 % 2.68 87 % 7.904 % 11.0

    12 % 49.603 % 0.242 1.198 % 3.803 % 38 % 37.666 % 1.01 0.012 % 4.988 % 63 % 22.494 % 2.80 88 % 7.296 % 12.1

    13 % 49.356 % 0.263 1.103 % 3.898 % 39 % 37.066 % 1.05 0.009 % 4.991 % 64 % 21.886 % 2.92 89 % 6.687 % 13.3

    14 % 49.088 % 0.285 1.011 % 3.989 % 40 % 36.464 % 1.10 0.006 % 4.994 % 65 % 21.278 % 3.05 90 % 6.080 % 14.8

    15 % 48.801 % 0.307 0.922 % 4.078 % 41 % 35.861 % 1.14 0.004 % 4.996 % 66 % 20.670 % 3.19 91 % 5.472 % 16.6

    16 % 48.495 % 0.330 0.838 % 4.163 % 42 % 35.256 % 1.19 0.003 % 4.998 % 67 % 20.062 % 3.34 92 % 4.864 % 18.9

    17 % 48.168 % 0.353 0.757 % 4.243 % 43 % 34.650 % 1.24 0.002 % 4.998 % 68 % 19.454 % 3.50 93 % 4.256 % 21.9

    18 % 47.821 % 0.376 0.680 % 4.320 % 44 % 34.043 % 1.29 0.001 % 4.999 % 69 % 18.846 % 3.66 94 % 3.648 % 25.8

    19 % 47.456 % 0.400 0.608 % 4.393 % 45 % 33.435 % 1.35 0.001 % 4.999 % 70 % 18.238 % 3.84 95 % 3.040 % 31.3

    20 % 47.071 % 0.425 0.540 % 4.461 % 46 % 32.828 % 1.40 0.000 % 4.999 % 71 % 17.630 % 4.03 96 % 2.432 % 39.5

    21 % 46.666 % 0.450 0.476 % 4.524 % 47 % 32.220 % 1.46 0.000 % 4.999 % 72 % 17.022 % 4.23 97 % 1.824 % 53.2

    22 % 46.244 % 0.476 0.417 % 4.583 % 48 % 31.613 % 1.52 0.000 % 5.000 % 73 % 16.415 % 4.45 98 % 1.216 % 80.6

    23 % 45.805 % 0.502 0.362 % 4.638 % 49 % 31.005 % 1.58 0.000 % 5.000 % 74 % 15.807 % 4.68 99 % 0.608 % 163

    24 % 45.347 % 0.529 0.312 % 4.687 % 50 % 30.398 % 1.64 0.000 % 5.000 % 75 % 15.199 % 4.93 100 % 0.000 % N/A

    25 % 44.874 % 0.557 0.267 % 4.733 %

  • 7/25/2019 Instrument Adjustement Policies

    23/44

    2015 NCSL International Workshop & Symposium

    AS-RECEIVED AS-LEFT DURING CAL INTERVAL END PERIOD

    Figure 9. Monte Carlo simulation model for one calibration interval

    eOBS= Error Observed for the UUT, as-received. It is equal to the End-Of-Period error for theprevious calibration interval (eEOP (i-1)). Only a portion of eOBSis due to systematic error (eBIAS (i-1)+

    eDRIFT (i-1)). However, any adjustments are performed equal-and-opposite to the whole of eOBS,

    which includes random error (eRAND (i-1)) in addition to systematic error (eBIAS (i-1)+ eDRIFT (i-1)).

    eBIAS= UUT attribute bias, as-left. If no adjustment has been made, eBIASremains the same as thesum of the systematic errors at the end of the previous calibration interval (eBIAS (i-1)+ eDRIFT (i-1)). If

    an adjustment is made, eBIASis equal to the negative of the previous random error (-eRAND (i-1)). After

    adjustment, eBIASis zero only if the random error during the previous cal interval (eRAND (i-1)) waszero (unlikely). Adjustment actions will always negate previously accumulated attribute bias,

    but will also result in attribute bias of their own, due to an overcompensated adjustment.

    eDRIFT= Error of UUT attributable to monotonic drift. If no adjustment is made, this systematicdrift error carries over or accumulates from one calibration interval to the next. For the model,

    eDRIFTis specified as a percentage of the allowable tolerance or accuracy specification. The

    remainder of the specification is then allocated to eRANDas (100 % - Drift %).

    eRAND= Error of UUT attributable to random behavior. A random number generator is used toselect eRANDfrom a normal Gaussian distribution. Ideally, no adjustment should be made tocompensate for this component. This is common-cause variation with an assumed periodsignificantly longer than the observation period during calibration. If all variation is random,

    adjusting is equivalent to tampering with a system which may otherwise be in a state of

    statistical control. It is analogous to moving the funnel in the Deming experiment.

    eEOP = Error of UUT at End of Period (includes attribute bias, plus drift, plus random error).

    Systematic

    Monotonic

    Drift

    eDRIFT

    In Tol ?

    (eOBSAdjustment

    Threshold?)

    Y

    [

    = ]

    Previous

    Cumulative

    Systematic Bias

    Adjustment

    Performed

    (Adjustment = -eOBS)

    Result:eOBS= 0

    N

    N

    Y

    Negative

    Previous

    Random

    Component

    = -eRAND (i-1)

    Random

    Component

    (Norm Dist)

    eRAND

    Y

    eEOP =

    eBIAS +

    eDRIFT +

    eRAND

    eBIAS eDRIFT eRANDe

    EOP

    eOBS

  • 7/25/2019 Instrument Adjustement Policies

    24/44

    2015 NCSL International Workshop & Symposium

    6. Results

    The results in Figure 10A and 10B were rendered via the Monte Carlo method to visually

    investigate aspects of the Weiss-Castrup drift model with regard to adjustment thresholds. The

    model in Figure 9 is repeated for 100 000 iterations and the number of Out-Of-Tolerance

    instances for eOBSis tallied over the 105cycles. The End-of-Period-Reliability is then computed as

    EOPR = (105OOTs)/105. This process is repeated ten times with the average taken to arrive ata final simulated EOPR output, applicable to a specifically chosen pair of values in the model,i.e. (1) the amount of monotonic drift and (2) the adjustment threshold. A 101 x 101 matrix of

    EOPR values is then generated by looping the process in +1 % increments from 0 % to 100 %for both the monotonic drift variable and the adjustment threshold variable. In total, ~10

    10Monte

    Carlo iterations are used in the generation of the matrix. This requires considerablecomputational brute-forceand consumed approximately 43 hours of CPU time running under

    MS Windows7 in Excel

    2010 using an Intel

    CoreTMi5 4300 CPU clocked at 2.6 GHz. See

    Appendix B for a discussion of using Excel for Monte Carlo methods.

    The resulting multivariate matrix can then be plotted as a three dimensional surface plot (Figures

    10A & 10B), with the EOPR values displayed on the vertical z-axis. Thex-axis represents themonotonic drift rate and they-axis represents the adjustment threshold, from 0 % to ~100 %each. This provides insight into the effects that these variables impart to EOPR, which is

    arguably the most important quality metric for many calibration and metrology organizations.

    Other important quality metrics, such as Test Uncertainty Ratio (TUR) and the Probability ofFalse Accept (PFA), are inextricably interrelated to the observed EOPR [22, 23].

    Figure 10A. 3D surface plot of EOPR as a function of adjustment threshold and drift

  • 7/25/2019 Instrument Adjustement Policies

    25/44

    2015 NCSL International Workshop & Symposium

    Figure 10B. 3D surface plot of EOPR as a function of adjustment threshold and drift

    It is important to bear in mind the nature of the x-axis, representing drift in Figures 10A and 10B.

    As the amount of driftincreases, the random behavior decreases, as assumed by this

    particular model (see Table 1). Other modeling can be performed with different parametric

    assumptions, e.g. where the random variation is held constant (or grows larger) in the presence ofincreasing drift. Still other assumptions, such as zero drift and increasing random variation, e.g.

    random-walk models, could be modeled. Such investigations would provide additional insight.

    It should also be noted that here, the x-axis merely approaches100 % drift (zero random error).When drift is exactly100 % of specification with zero random error, all adjustment thresholds

    100 % result in 100 % EOPR. In that case, adjustments are always performed and they are

    always perfect due to the absence of random error (assuming infinite TUR; see assumption #5

    in Section 4).

    Many implications exist from the resulting model in Figure 10A and 10B for the stated

    assumptions. Perhaps the most significant commonality in all instances is that, as the calibrationadjustment threshold increases from 0 % to 100 % of specification, the EOPR remains constant

    or decreases in all cases; it neverincreases. This is further illustrated in Figure 11.

  • 7/25/2019 Instrument Adjustement Policies

    26/44

    2015 NCSL International Workshop & Symposium

    Figure 11. EOPR as a function of adjustment threshold for various levels of drift

    In Figure 11, note that for the case of purely random variation with zero drift (green line), theEOPR is constant at 83.4 %, just as the Weiss and Deming model would predict when

    adjustments are always made (i.e. adjustment threshold of 0 % of specification). However, it is

    interesting to note that this 83.4 % EOPR does not improve as the adjustment threshold is

    increased from 0 % (always adjust) towards 100 % of specification (adjust less frequently).

    Why does an increase in EOPR (reduction in variability) not result, in this purely random case,

    as the adjustment threshold increases from 0 % to 100 % (i.e. less frequent adjustments)? Theanswer to this question can be elucidated if the scale of the adjustment threshold and y-axis are

    extended beyond the 100 % of specification limit (OOT point). With the model constrained to a

    maximum of 100 % adjustment threshold in the purely random case, adjustments will still bemade for all observed OOT conditions. Even though these adjustments occur less frequently than

    the always-adjustscenario (0 % adjustment threshold), the magnitude of these less frequent

    adjustments or tamperingis always quite large. For purely random systems, these large but less-

    frequent adjustments for observed OOT conditions ultimately result in the same outcome as the

    Weiss and Deming model predict; i.e. they lead to the same increased variability (2or 2and resulting lower EOPR (83.4 %), just as if adjustment or tampering was performed everytime.

    If the adjustment threshold is increased to 500 % of specification (or more), and the simulation is

    run again, a decrease in variability (from2to ) and resulting increase in EOPR (from 83.4 %to 95 %) is indeed observed. However, the transition region where this phenomenon occurs is not

    well-behaved (see Figure 12). That is, as the adjustment threshold is raised above 100 % of

    specification, fewer and fewer adjustments are ever made. The probability of adjustment

  • 7/25/2019 Instrument Adjustement Policies

    27/44

    2015 NCSL International Workshop & Symposium

    becomes exceedingly low. However, when one of these very rare events does occur, triggering

    an adjustment (after many thousands of iterations of the Monte Carlo simulation), the effect is

    quite significant. Since it was presumed to be a random event, no adjustment should have beenmade (even at 150 %, 200 %, 300 % of specification, or more). Adjusting such a large random

    error imparts an equally large attribute bias, opposite in magnitude.

    Figure 12. Monte Carlo modeled behavior for random errors w/ adjust thresholds >100 % of spec

    If the Monte Carlo simulations are extended to include adjustment thresholds far above 100 % ofspecification (>OOT), the EOPR behavior becomes somewhat erratic between 150 % and 270 %

    of specification. It ultimately settles at the 95 % EOPR, just as if no adjustments were ever made,because no adjustments areessentially ever made when the adjustment threshold is so large. The

    repeatability of the Monte Carlo process is also poor in this transition region (even with 106

    iterations) because the results of the simulation are highly sensitive to very improbable events.After the adjustment threshold extends beyond ~270 % of specification (~5.5), adjustment

    actions become so rare as to approach the never adjust scenario of the Deming funnel (rule #1)

    where the variation is lowest. Under these circumstances, the EOPR settles at the original 95 %

    containment probability of the purely random variation with respect to the 1.96specificationlimits.

    This scenario will likely find little application in calibration laboratories. One would have to be

    willing to not adjust instruments with observed errors >>100 % of specification (highly OOT).The rationale for such decision would be to attribute all errors (regardless of how large) as purely

    random events that would not remain if simply left alone and not adjusted. In reality, such large

    errors may be much more likely to be true attribute bias resulting from special-cause variationsuch as misuse, over-ranging, rough handling, etc. Analysis of historical data is of great benefit

    when attempting to characterize such errors.

  • 7/25/2019 Instrument Adjustement Policies

    28/44

    2015 NCSL International Workshop & Symposium

    7. EOPR Reliability Targets

    The use of EOPR as a quality metric for calibrated equipment is of great importance. EOPR

    targets are analogous to an Acceptable Quality Level (AQL) in manufacturing environments.

    Both metrics speak to the percentage of items that comply with their stated specifications,although AQLs are expressed as the complement of this (i.e. tolerable percent defective, not to

    be confused with LTPD). Calibration intervals are often adjusted in an effort to achieve thesegoals. Target EOPR levels are often proprietary for commercial and private industry. However,it is insightful to review some EOPR objectives for calibrated equipment in military and

    aerospace organizations. A summary of such targets is provided here.

    TARGET EOPR LEVELS

    NASAKennedy Space Center (KNPR 8730.1, Rev. Basic-1; 2003 to 2009, Obsolete)At KSC, calibration intervals are adjusted to achieve an EOPR range of 0.85to 0.95.[24]

    U.S. Navy (OPNAV 3960.16A; 2005)

    CNO policy requires USN/USMC to: (o) Establish an objective end of period reliability goal forTMDE equal to or greater than 85 percent, with the threshold reliability in no case to be lowerthan 72 percent.[25]

    U.S. Navy (Albright, J. Thesis; 1997)intervals are based on End-Of-Period (EOP) operational reliability targets of 72%for non-critical General Purpose Test Equipment (GPTE) and 85%for critical Special Purpose TestEquipment (SPTE). [26]

    U.S. Air Force (TO 00-20-14; 2011)The Air Force calibration interval is the period of time over which the equipment shall performits mission or function with a statistically derived end-of-period reliability (shall be within tolerance)of 85%or better. [27]

    U.S. Army (GAO B-160682, LCD-77-427; 1977, Obsolete)the Army decided to follow the Air Force's and Navy's lead in establishing an 85-percentend-of-period reliability requirement. However, the Army has adopted a new statistical model andchanged its policy to require 75-percentend-of-period reliability.[28]

    U.S. Army (AR 750-43; 2014, Current)On average, 90 percent of items will be in tolerance over the calibration interval, and 81 percentwill be in tolerance at the end of the interval.[29]

    The NCSL International Benchmarking Survey (LM-5) provides additional information onEOPR targets, termed Average % In-Tolerance Target [55]. In the survey, statistics were

    aggregated from 357 national and international respondents polled in 2007. Demographics

    included aerospace, military & defense, automotive, biomedical/pharmaceutical,chemical/process, electronics, government, healthcare, M&TE manufacturers, medicalequipment, military, nuclear/energy, service industry, universities and R&D, and other. This

    NCSLI survey found:

    4 % of respondents employ EOPR targets 95 %

    http://doni.daps.dla.mil/Directives/03000%20Naval%20Operations%20and%20Readiness/03-900%20Research,%20Development,%20Test%20and%20Evaluation%20Services/3960.16A.pdfhttps://calhoun.nps.edu/bitstream/handle/10945/8906/reliabilityenhan00albr.pdfhttps://calhoun.nps.edu/bitstream/handle/10945/8906/reliabilityenhan00albr.pdfhttp://www.wpafb.af.mil/shared/media/document/AFD-120724-063.pdfhttp://www.wpafb.af.mil/shared/media/document/AFD-120724-063.pdfhttp://gao.justia.com/national-aeronautics-and-space-administration/1977/5/a-central-manager-is-needed-to-coordinate-the-military-diagnostic-and-calibration-program-lcd-77-427/LCD-77-427-full-report.pdfhttp://gao.justia.com/national-aeronautics-and-space-administration/1977/5/a-central-manager-is-needed-to-coordinate-the-military-diagnostic-and-calibration-program-lcd-77-427/LCD-77-427-full-report.pdfhttp://www.apd.army.mil/pdffiles/r750_43.pdfhttp://www.apd.army.mil/pdffiles/r750_43.pdfhttp://www.apd.army.mil/pdffiles/r750_43.pdfhttp://gao.justia.com/national-aeronautics-and-space-administration/1977/5/a-central-manager-is-needed-to-coordinate-the-military-diagnostic-and-calibration-program-lcd-77-427/LCD-77-427-full-report.pdfhttp://www.wpafb.af.mil/shared/media/document/AFD-120724-063.pdfhttps://calhoun.nps.edu/bitstream/handle/10945/8906/reliabilityenhan00albr.pdfhttp://doni.daps.dla.mil/Directives/03000%20Naval%20Operations%20and%20Readiness/03-900%20Research,%20Development,%20Test%20and%20Evaluation%20Services/3960.16A.pdf
  • 7/25/2019 Instrument Adjustement Policies

    29/44

    2015 NCSL International Workshop & Symposium

    8. Non-Adjustable Instruments

    It should be noted that, in the presence of anyamount of monotonic drift regardless of how

    small, an adjustment will eventually have to be made or the attribute bias will ultimately exceed

    the allowable specification. Indeed, the very practice of shortening an interval to increase EOPRis somewhat predicated on some form of time-dependent mechanism increasing the magnitude of

    possible errors, along with the ability to adjust (reduce) the attribute bias to or near zero.

    For non-adjustable instruments, EOPR cannot generally be increased by shortening a calibration

    interval via the same mechanism applicable to adjustable instruments. However, shortening thecalibration interval for non-adjustable instruments can still be beneficial in two ways.

    1)An increase in EOPR can still result from shortening the calibration interval for non-adjustable instruments which exhibit a relatively small time-dependent mechanism for

    transitioning to an OOT condition (e.g. low drift). This is true because more in-tolerance

    calibrations will be performed prior to the occurrence of an OOT condition. Once a non-adjustable instrument incurs its first OOT condition, it cannot be adjusted back into

    tolerance and has effectively reached the end of its service life at which point EOPR =(#Calibrations - 1) / (#Calibrations). The shorter the interval, the more in-tolerancecalibrations will have been performed and the higher the EOPR will be. After the first OOT

    event, the instrument must then be retired from service or the allowable tolerance must be

    increased with consent from the end-user or charted values must be manually employedvia a Report of Test or Calibration Certificate. Such action should only be taken if no impact

    will result to the application or process for which the instrument is employed.

    2)Organizational benefits, other than increased EOPR, can also be realized through shorteningof calibration intervals for non-adjustable instruments. These benefits do not manifest as an

    increase in EOPR, but rather in a reduction of the exposure to possible consequences

    associated with an out-of-tolerance condition. For example, a working-standard resistor(calibrated to a tolerance) may not be adjustable. An out-of-tolerance condition may

    eventually arise from drift or even special-cause variation (over-power/voltage, mechanicalshock/damage, etc.) Shortening the calibration interval will provide no direct benefit to

    EOPR via a reduction in errors through adjustment. However, since any OOT condition will

    result in an impact assessment (reverse traceability) for all instruments calibrated by thisOOT resistor, a shorter calibration interval will reduce the amount of impact possible

    assessments and risk exposure to product or process, providing benefits of a different nature.

    9. Conclusions

    Discretionary adjustment during calibration of in-tolerance equipment is not mandated by

    national and international calibration standards ANSI/Z540.3 and ISO-17025, nor is adjustmentcontained within the VIM definition of calibration. A model has been used here in an attempt to

    describe the effect of various discretionary adjustment thresholds on in-tolerance instruments,assuming a specific behavioral mode called the Weiss-Castrup drift model and under very

    specific assumptions. These assumptions may not hold for many items of TM&DE. Other

    alternative assumptions, where the domain of drift and random behavior simultaneouslycomprise only a small percentage of the associated specification, may yield significantly

    different results and are worthy of further investigation.

  • 7/25/2019 Instrument Adjustement Policies

    30/44

    2015 NCSL International Workshop & Symposium

    Using Monte Carlo methods, the effect of various discretionary adjustment thresholds on End Of

    Period Reliability (EOPR) has been investigated for in-tolerance instruments under these specificconditions. For the model and assumptions stated, it is shown that discretionary adjustments of

    in-tolerance instruments can be beneficial in the presence of monotonic drift superimposed on

    random variation. Under these conditions, the non-adjustment benefitsof reduced variation(increased EOPR), posed by the Weiss model and Deming funnel model, do not appear tomanifest between the 0 % and 100 % of specification adjustment thresholds. As the calibration

    adjustment threshold increases from 0 % to 100 % of specification, the EOPR remains constant

    or decreases in all cases; it neverincreases. Only after the adjustment threshold far exceeds100 % of specification and effectively approaches the never-adjustscenario, are these benefits

    realized for purely random behavior. Never adjusting items with any significant amount of

    monotonic drift is not a viable option, as these instruments will rather quickly transition to an

    OOT condition resulting from a true attribute bias due to drift.

    The assumptions of the model may be idealized and unrealistic in the empirical world. Moreover,

    it may be unlikely that the behavior of any instrument would be entirely restricted to only thetwo change mechanisms accommodated by this model or the domain of magnitudes and/or

    proportions of drift and random behavior restricted to the values modeled here. Many general

    purpose TM&DE instruments may perform considerably better than their specifications would

    imply. They may also be impacted by other behavioral characteristics and special cause events,hindering the use of this model and of the use of linear regression as a prediction technique.

    Random walk behavior, where the magnitude of the random variation () itself increases withtime may be more realistic in many cases. Under such random walk models, the probability of

    OOT events increases with time, even in the absence of monotonic drift. Much opportunity for

    continued investigations and research exists in this regard. However, the assumptions stated

    herein, when combined with the Weiss-Castrup drift model, provide a rudimentary workingconstruct with which to glean useful insight into the effect of various adjustment thresholds for

    in-tolerance instruments under a variety of systematic and random errors.

    Many programmatic factors must be considered when implementing instrument adjustment

    policies or thresholds, above and beyond the exclusive consideration of maximizing EOPR.