531notesv2 (spc)

8/21/2019 531notesv2 (SPC)

1/133

Graduate Lectures and Problems in QualityControl and Engineering Statistics:

Theory and Methods

To Accompany

Statistical Quality Assurance Methods for Engineers

by

Vardeman and Jobe

Stephen B. Vardeman

V2.0: January 2001

c° Stephen Vardeman 2001. Permission to copy for educationalpurposes granted by the author, subject to the requirement thatthis title page be a¢xed to each copy (full or partial) produced.

8/21/2019 531notesv2 (SPC)

2/133

2

8/21/2019 531notesv2 (SPC)

3/133

Contents

1 Measurement and Statistics 1

1.1 Theory for Range-Based Estimation of Variances . . . . . . . . . 1

1.2 Theory for Sample-Variance-Based Estimation of Variances . . . 3

1.3 Sample Variances and Gage R&R . . . . . . . . . . . . . . . . . . 41.4 ANOVA and Gage R&R . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Con…dence Intervals for Gage R&R Studies . . . . . . . . . . . . 7

1.6 Calibration and Regression Analysis . . . . . . . . . . . . . . . . 10

1.7 Crude Gaging and Statistics . . . . . . . . . . . . . . . . . . . . . 111.7.1 Distributions of Sample Means and Ranges from Integer

Observations . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.7.2 Estimation Based on Integer-Rounded Normal Data . . . 13

2 Process Monitoring 21

2.1 Some Theory for Stationary Discrete Time Finite State MarkovChains With a Single Absorbing State . . . . . . . . . . . . . . . 21

2.2 Some Applications of Markov Chains to the Analysis of ProcessMonitoring Schemes . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3 Integral Equations and Run Length Properties of Process Moni-toring Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 An Introduction to Discrete Stochastic Control Theory/MinimumVariance Control 37

3.1 General Exposition . . . . . . . . . . . . . . . . . . . . . . . . . . 373.2 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4 Process Characterization and Capability Analysis 45

4.1 General Comments on Assessing and Dissecting “Overall Variation” 45

4.2 More on Analysis Under the Hierarchical Random E¤ects Model 474.3 Finite Population Sampling and Balanced Hierarchical Structures 50

5 Sampling Inspection 535.1 More on Fraction Nonconforming Acceptance Sampling . . . . . 53

5.2 Imperfect Inspection and Acceptance Sampling . . . . . . . . . . 58

3

8/21/2019 531notesv2 (SPC)

4/133

4 CONTENTS

5.3 Some Details Concerning the Economic Analysis of Sampling In-spection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6 Problems 691 Measurement and Statistics . . . . . . . . . . . . . . . . . . . . . 692 Process Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . 743 Engineering Control and Stochastic Control Theory . . . . . . . 934 Process Characterization . . . . . . . . . . . . . . . . . . . . . . . 1015 Sampling Inspection . . . . . . . . . . . . . . . . . . . . . . . . . 115

A Useful Probabilistic Approximation 127

8/21/2019 531notesv2 (SPC)

5/133

Chapter 1

Measurement and Statistics

V&J §2.2 presents an introduction to the topic of measurement and the relevanceof the subject of statistics to the measurement enterprise. This chapter expandssomewhat on the topics presented in V&J and raises some additional issues.

Note that V&J equation (2.1) and the discussion on page 19 of V&J arecentral to the role of statistics in describing measurements in engineering andquality assurance. Much of Stat 531 concerns “process variation.” The discus-sion on and around page 19 points out that variation in measurements from aprocess will include both components of “real” process variation and measure-ment variation.

1.1 Theory for Range-Based Estimation of Vari-

ancesSuppose that X 1; X 2; : : : ; X n are iid Normal (¹,¾2) random variables and let

R = max X i ¡ minX i= max(X i ¡ ¹) ¡ min(X i ¡ ¹)

= ¾

µmax

µX i ¡ ¹

¾

¶¡ min

µX i ¡ ¹

¾

¶¶= ¾ (max Z i ¡ minZ i)

where Z i = (X i ¡ ¹)=¾. Then Z 1; Z 2; : : : ; Z n are iid standard normal randomvariables. So for purposes of studying the distribution of the range of iid normal

variables, it su¢ces to study the standard normal case. (One can derive “general¾” facts from the “¾ = 1” facts by multiplying by ¾.)

Consider …rst the matter of the …nding the mean of the range of n iid stan-dard normal variables, Z 1; : : : ; Z n. Let

U = min Z i; V = max Z i and W = V ¡ U :

1

8/21/2019 531notesv2 (SPC)

6/133

2 CHAPTER 1. MEASUREMENT AND STATISTICS

Then

EW = EV

¡EU

and

¡EU = ¡E minZ i = E(¡min Z i) = E max(¡Z i) ;where the n variables ¡Z 1; ¡Z 2; : : : ; ¡Z n are iid standard normal. Thus

EW = EV ¡ EU = 2EV :

Then, (as is standard in the theory of order statistics) note that

V · t , all n values Z i are · t :

So with © the standard normal cdf,

P [V · t] = ©n(t)and thus a pdf for V is

f (v) = nÁ(v)©n¡1(v) :

So

EV =

Z 1¡1

v¡

nÁ(v)©n¡1(v)¢

dv ;

and the evaluation of this integral becomes a (very small) problem in numericalanalysis. The value of this integral clearly depends upon n. It is standard toinvent a constant (whose dependence upon n we will display explicitly)

d2(n) := EW = 2EV

that is tabled in Table A.1 of V&J. With this notation, clearly

ER = ¾d2(n) ;

(and the range-based formulas in Section 2.2 of V&J are based on this simplefact).

To …nd more properties of W (and hence R) requires appeal to a well-knownorder statistics result giving the joint density of two order statistics. The jointdensity of U and V is

f (u; v) =

½ n(n ¡ 1)Á(u)Á(v)(©(v) ¡ ©(u))n¡2 for v > u0 otherwise :

A transformation then easily shows that the joint density of U and W = V ¡ U is

g(u; w) =

½ n(n ¡ 1)Á(u)Á(u + w)(©(u + w) ¡ ©(u))n¡2 for w > 00 otherwise :

8/21/2019 531notesv2 (SPC)

7/133

1.2. THEORY FOR SAMPLE-VARIANCE-BASED ESTIMATION OF VARIANCES 3

Then, for example, the cdf of W is

P [W · t] = Z t0

Z 1¡1

g(u; w)dudw ;

and the mean of W 2 is

EW 2 =

Z 10

Z 1¡1

w2g(u; w)dudw :

Note that upon computing EW and EW 2, one can compute both the varianceof W

Var W = EW 2 ¡ (EW )2

and the standard deviation of W ,p

Var W . It is common to give this standarddeviation the name d3(n) (where we continue to make the dependence on n

explicit and again this constant is tabled in Table A.1 of V&J). Clearly, havingcomputed d3(n)

:=

p Var W , one then has

p Var R = ¾d3(n) :

1.2 Theory for Sample-Variance-Based Estima-tion of Variances

Continue to suppose that X 1; X 2; : : : ; X n are iid Normal (¹; ¾2) random vari-

ables and take

s2 :=

1

n ¡ 1

n

Xi=1(X i ¡ ¹X )2 :

Standard probability theory says that

(n ¡ 1)s2¾2

» Â2n¡1 :

Now if U » Â2º it is the case that EU = º and Var U = 2º . It is thus immediatethat

Es2 = E

µ ¾2

n ¡ 1¶µ

(n ¡ 1)s2¾2

¶=

µ ¾2

n ¡ 1¶

E

µ(n ¡ 1)s2

¾2

¶= ¾2

and

Var s2 = Var µµ ¾2n ¡ 1

¶µ(n ¡ 1)s2¾2

¶¶ = µ ¾2n ¡ 1

¶2 Var µ(n ¡ 1)s2¾2

¶ = 2¾4n ¡ 1

so that p Var s2 = ¾2

r 2

n ¡ 1 :

8/21/2019 531notesv2 (SPC)

8/133


Knowing that (n ¡ 1)s2=¾2 » Â2n¡1 also makes it easy enough to developproperties of s =

p s2. For example, if

f (x) =

8 0

0 otherwise

is the Â2n¡1 probability density, then

Es = E

r ¾2

n ¡ 1

r (n ¡ 1)s2

¾2 =

¾p n ¡ 1

Z 10

p xf (x)dx = ¾c4(n) ;

for

c4(n) :=

R 10

p xf (x)dxp

n ¡ 1another constant (depending upon n) tabled in Table A.1 of V&J. Further, thestandard deviation of s is

p Var s =

qEs2 ¡ (Es)2 =

q¾2 ¡ (¾c4(n))2 = ¾

q1 ¡ c24(n) = ¾c5(n)

for

c5(n) :=q

1 ¡ c24(n)yet another constant tabled in Table A.1.

The fact that sums of independent Â2 random variables are again Â2 (withdegrees of freedom equal to the sum of the component degrees of freedom) andthe kinds of relationships in this section provide means of combining variouskinds of sample variances to get “pooled” estimators of variances (and variance

components) and …nding the means and variances of these estimators. For ex-ample, if one pools in the usual way the sample variances from r normal samplesof size m to get a single pooled sample variance, s2pooled , r(m ¡ 1)s2pooled =¾2 isÂ2 with degrees of freedom º = r(m ¡ 1). That is, all of the above can beapplied by thinking of s2pooled as a sample variance based on a sample of size“n”= r(m ¡ 1) + 1.

1.3 Sample Variances and Gage R&R

The methods of gage R&R analysis presented in V&J §2.2.2 are based on ranges(and the facts in §1.1 above). They are presented in V&J not because of theire¢ciency, but because of their computational simplicity. Better (and analo-

gous) methods can be based on the facts in §1.2 above. For example, under thetwo-way random e¤ects model (2.4) of V&J, if one pools I £ J “cell” samplevariances s2ij to get s

2pooled , all of the previous paragraph applies and gives meth-

ods of estimating the repeatability variance component ¾2 (or the repeatabilitystandard deviation ¾) and calculating means and variances of estimators basedon s2pooled .

8/21/2019 531notesv2 (SPC)

9/133

1.4. ANOVA AND GAGE R&R 5

Or, consider the problem of estimating ¾reproducibility de…ned in display (2.5)of V&J. With ¹yij as de…ned on page 24 of V&J, note that for …xed i, the

J random variables ¹yij ¡ ®i have the same sample variance as the J randomvariables ¹yij, namely

s2i:=

1

J ¡ 1Xj

(¹yij ¡ ¹yi:)2 :

But for …xed i the J random variables ¹yij ¡ ®i are iid normal with mean ¹ andvariance ¾2¯ + ¾

2®¯ + ¾

2=m, so that

Es2i = ¾2¯ + ¾

2®¯ + ¾

2=m :

So1

I

Xi

s2i

is a plausible estimator of ¾2¯ + ¾2®¯ + ¾

2=m. Hence

1

I

Xi

s2i ¡s2pooled

m ;

or better yet

max

Ã0;

1

I

Xi

s2i ¡s2pooled

m

! (1.1)

is a plausible estimator of ¾2reproducibility .

1.4 ANOVA and Gage R&R

Under the two-way random e¤ects model (2.4) of V&J, with balanced data, it

is well-known that the ANOVA mean squares

MSE = 1

IJ (m ¡ 1)Xi;j;k

(yijk ¡ ¹y::)2 ;

MSAB = m

(I ¡ 1)(J ¡ 1)Xi;j

(¹yij ¡ ¹yi: ¡ ¹y:j + ¹y::)2 ;

MSA = mJ

I ¡ 1Xi

(¹yi: ¡ ¹y::)2 ; and

MSB = mI

J ¡ 1Xi

(¹y:j ¡ ¹y::)2 ;

are independent random variables, that

EMSE = ¾2 ;

EMSAB = ¾2 + m¾2®¯ ;

EMSA = ¾2 + m¾2®¯ + mJ¾2® ; and

EMS B = ¾2 + m¾2®¯ + mI¾2¯ ;

8/21/2019 531notesv2 (SPC)

10/133


Table 1.1: Two-way Balanced Data Random E¤ects Analysis ANOVA Table

ANOVA TableSource SS df MS EMS Parts SSA I ¡ 1 MSA ¾2 + m¾2®¯ + mJ¾2®Operators SSB J ¡ 1 MSB ¾2 + m¾2®¯ + mI¾2¯Parts£Operators SSAB (I ¡ 1)(J ¡ 1) M SAB ¾2 + m¾2®¯Error SSE (m ¡ 1)IJ MSE ¾2Total SST ot mIJ ¡ 1

and that the quantities

(m ¡ 1)IJMSE EMSE

; (I ¡ 1)(J ¡ 1)MSAB

EMSAB ;

(I ¡ 1)MSAEMSA

and (J ¡ 1)MSB

EMSB

are Â2 random variables with respective degrees of freedom

(m ¡ 1)IJ ; (I ¡ 1)(J ¡ 1) ; (I ¡ 1) and (J ¡ 1) :

These facts about sums of squares and mean squares for the two-way randome¤ects model are often summarized in the usual (two-way random e¤ects model)ANOVA table, Table 1.1. (The sums of squares are simply the mean squaresmultiplied by the degrees of freedom. More on the interpretation of such tablescan be found in places like §8-4 of V.)

As a matter of fact, the ANOVA error mean square is exactly s2pooled from§1.3 above. Further, the expected mean squares suggest ways of producing sen-sible estimators of other parametric functions of interest in gage R&R contexts(see V&J page 27 in this regard). For example, note that

¾2reproducibility = 1

mI EMSB +

1

m(1 ¡ 1

I )EMSAB ¡ 1

mEMSE ;

which suggests the ANOVA-based estimator

b¾2reproducibility = maxµ0; 1mI MSB + 1m(1 ¡ 1I )MSAB ¡ 1mMSE ¶

: (1.2)

What may or may not be well known is that this estimator (1.2) is exactly theestimator of ¾2reproducibility in display (1.1).

Since many common estimators of quantities of interest in gage R&R studiesare functions of mean squares, it is useful to have at least some crude standard

errors for them. These can be derived from “delta method”/“propagation of error”/Taylor series argument provided in the appendix to these notes. Forexample, if M S i i = 1; : : : ; k are independent random variables, (º iMS i=EMS i)with a Â2º i distribution, consider a function of k real variables f (x1; : : : ; xk) andthe random variable

U = f (MS 1; MS 2;:::;MS k) :

8/21/2019 531notesv2 (SPC)

11/133

1.5. CONFIDENCE INTERVALS FOR GAGE R&R STUDIES 7

Propagation of error arguments produce the approximation

Var U ¼ kXi=1

Ã @f @xi

¯̄̄̄EMS 1;EMS 2;:::;EMS k

!2 Var MS i = kXi=1

Ã @f @xi

¯̄̄̄EMS 1;EMS 2;:::;EMS k

!2 2(EMS i)2º i

;

and upon substituting mean squares for their expected values, one has a stan-dard error for U , namely

p dVar U =v uut2 kX

i=1

Ã@f

@xi

¯̄̄̄MS 1;MS 2;:::;MS k

!2(MS i)2

º i: (1.3)

In the special case where the function of the mean squares of interest is linearin them, say

U =

k

Xi=1

ciMS i ;

the standard error specializes to

p dVar U =v uut2 kX

i=1

c2i (MS i)2

º i;

which provides at least a crude method of producing standard errors for b¾2reproducibilityand b¾2overall. Such standard errors are useful in giving some indication of theprecision with which the quantities of interest in a gage R&R study have beenestimated.

1.5 Con…dence Intervals for Gage R&R Studies

The parametric functions of interest in gage R&R studies (indeed in all randome¤ects analyses) are functions of variance components, or equivalently, functionsof expected mean squares. It is thus possible to apply theory for estimating suchquantities to the problem of assessing precision of estimation in a gage study.As a …rst (and very crude) example of this, note that taking the point of view of §1.4 above, where U = f (MS 1; MS 2; : : : ; M S k) is a sensible point estimator of

an interesting function of the variance components andp dVar U is the standard

error (1.3), simple approximate two-sided 95% con…dence limits can be made as

U

§1:96p dVar U :These limits have the virtue of being amenable to “hand” calculation from the

ANOVA sums of squares, but they are not likely to be reliable (in terms of holding their nominal/asymptotic coverage probability) for I ,J or m small.

Linear models experts have done substantial research aimed at …nding re-liable con…dence interval formulas for important functions of expected mean

8/21/2019 531notesv2 (SPC)

12/133


squares. For example, the book Con…dence Intervals on Variance Components by Burdick and Graybill gives results (on the so-called “modi…ed large sample

method”) that can be used to make con…dence intervals on various importantfunctions of variance components. The following is some material taken fromSections 3.2 and 3.3 of the Burdick and Graybill book.

Suppose that MS 1; MS 2; : : : ; M S k are k independent mean squares. (TheMS i are of the form SS i=º i, where SS i=EMS i = º iMS i=EMS i has a Â

2º i

distribution.) For 1 · p < k and positive constants c1; c2; : : : ; ck suppose thatthe quantity

µ = c1EMS 1 + ¢ ¢ ¢ + c pEMS p ¡ c p+1EMS p+1 ¡ ¢ ¢ ¢ ¡ ckEMS k (1.4)is of interest. Let

bµ = c1MS 1 + ¢ ¢ ¢ + c pMS p ¡ c p+1MS p+1 ¡ ¢ ¢ ¢ ¡ ckMS k :

Approximate con…dence limits on µ in display (1.4) are of the form

L = bµ ¡qV L and/or U = bµ +qV U ;for V L and V U de…ned below.

Let F ®:df 1;df 2 be the upper ® point of the F distribution with df 1 and df 2degrees of freedom. (It is then the case that F ®:df 1;df 2 = (F 1¡®:df 2;df 1)

¡1.) Also,let Â2®:df be the upper ® point of the Â

2df distribution. With this notation

V L =

pXi=1

c2iMS 2i G

2i+

kXi= p+1

c2iMS 2i H

2i +

pXi=1

kXj= p+1

cicjMS iMS jGij+

p¡1Xi=1

pXj>i

cicjMS iMS jG¤ij ;

forGi = 1 ¡ º i

Â2®:º i;

H i = º i

Â21¡®:º i¡ 1 ;

Gij =(F ®:º i;º j ¡ 1)2 ¡ G2iF 2®:º i;º j ¡ H 2j

F ®:º i;º j;

and

G¤ij =

8>:

0 if p = 1

1

p ¡ 1 Ãµ1 ¡ º i +º j

Â®:º i+º j ¶2

(º i +º j)2

º iº j ¡

G2iº i

º j ¡

G2jº j

º i ! otherwise :On the other hand,

V U =

pXi=1

c2iMS 2i H

2i +

kXi= p+1

c2iMS 2i G

2i+

pXi=1

kXj= p+1

cicjMS iMS jH ij+k¡1Xi= p+1

kXj>i

cicjMS iMS jH ¤ij

8/21/2019 531notesv2 (SPC)

13/133

1.5. CONFIDENCE INTERVALS FOR GAGE R&R STUDIES 9

for Gi and H i as de…ned above, and

H ij =(1 ¡ F 1¡®:º i;º j)2 ¡ H 2i F 21¡®:º i;º j ¡ G2j

F 1¡®:º i;º j;

and

H ¤ij =

8>>>:0 if k = p + 1

1

k ¡ p ¡ 1

0@Ã1 ¡ º i +º jÂ2®:º i+º j

!2(º i +º j)

2

º iº j¡ G

2i º i

º j¡ G

2jº j

º i

1A otherwise :One uses (L; 1) or (¡1; U ) for con…dence level (1¡ ®) and the interval (L; U )for con…dence level (1 ¡ 2®). (Using these formulas for “hand” calculation is

(obviously) no picnic. The C program written by Brandon Paris (available o¤ the Stat 531 Web page) makes these calculations painless.)

A problem similar to the estimation of quantity (1.4) is that of estimating

µ = c1EMS 1 + ¢ ¢ ¢ + c pEMS p (1.5)

for p ¸ 1 and positive constants c1; c2; : : : ; c p. In this case let

bµ = c1MS 1 + ¢ ¢ ¢ + c pMS p ;and continue the Gi and H i notation from above. Then approximate con…dencelimits on µ given in display (1.5) are of the form

L = bµ ¡v uut pXi=1

c2iMS 2i G

2i and/or U =

bµ +v uut pXi=1

c2iMS 2i H

2i :

One uses (L; 1) or (¡1; U ) for con…dence level (1¡ ®) and the interval (L; U )for con…dence level (1 ¡ 2®).

The Fortran program written by Andy Chiang (available o¤ the Stat 531Web page) applies Burdick and Graybill-like material and the standard errors(1.3) to the estimation of many parametric functions of relevance in gage R&Rstudies.

Chiang’s 2000 Ph.D. dissertation work (to appear in Technometrics in Au-gust 2001) has provided an entirely di¤erent method of interval estimation of

functions of variance components that is a uniform improvement over the “mod-i…ed large sample” methods presented by Burdick and Graybill. His approachis related to “improper Bayes” methods with so called “Je¤reys priors.” Andyhas provided software for implementing his methods that, as time permits, willbe posted on the Stat 531 Web page. He can be contacted (for preprints of hiswork) at [email protected] at the National University of Singapore.

8/21/2019 531notesv2 (SPC)

14/133


1.6 Calibration and Regression Analysis

The estimation of standard deviations and variance components is a contribu-tion of the subject of statistics to the quanti…cation of measurement systemprecision . The subject also has contributions to make in the matter of im-proving measurement accuracy . Calibration is the business of bringing a localmeasurement system in line with a standard measurement system. One takesmeasurements y with a gage or system of interest on test items with “known”values x (available because they were previously measured using a “gold stan-dard” measurement device). The data collected are then used to create a con-version scheme for translating local measurements to approximate gold standardmeasurements, thereby hopefully improving local accuracy. In this short sec-tion we note that usual regression methodology has implications in this kind of enterprise.

The usual polynomial regression model says that n observed random values

yi are related to …xed values xi via

yi = ¯ 0 + ¯ 1xi + ¯ 2x2i + ¢ ¢ ¢ + ¯ kxki + "i (1.6)

for iid Normal (0; ¾2) random variables "i. The parameters ¯ and ¾ are theusual objects of inference in this model. In the calibration context with x agold standard value, ¾ quanti…es precision for the local measurement system.Often (at least over a limited range of x) 1) a low order polynomial does a good

job of describing the observed x-y relationship between local and gold standardmeasurements and 2) the usual (least squares) …tted relationship

ŷ = g(x) = b0 + bx + b2x2 + ¢ ¢ ¢ + bkxk

has an inverse g¡1

(y). When such is the case, given a measurement yn+1 fromthe local measurement system, it is plausible to estimate that a correspondingmeasurement from the gold standard system would be x̂n+1 = g

¡1(yn+1). Areasonable question is then “How good is this estimate?”. That is, the matterof con…dence interval estimation of xn+1 is important.

One general method for producing such con…dence sets for xn+1 is based onthe usual “prediction interval” methodology associated with the model (1.6).That is, for a given x, it is standard (see, e.g. §9-2 of V or §9.2.4 of V&J#2) toproduce a prediction interval of the form

ŷ § tq

s2 + (std error(ŷ))2

for an additional corresponding y . And those intervals have the property that

for all choices of x;¾; ¯ 0; ¯ 1; ¯ 2;:::;¯ k

P x;¾;¯0;¯1;¯2;:::;¯k [y is in the prediction interval at x]

= desired con…dence level

= 1¡ P [a tn¡k¡1 random variable exceeds jtj] .

8/21/2019 531notesv2 (SPC)

15/133

1.7. CRUDE GAGING AND STATISTICS 11

But rewording only slightly, the event

“y is in the prediction interval at x”is the same as the event

“x produces a prediction interval including y.”

So a con…dence set for xn+1 based on the observed value yn+1 is

fxj the prediction interval corresponding to x includes yn+1g . (1.7)Conceptually, one simply makes prediction limits around the …tted relationshipŷ = g(x) = b0 + bx + b2x2 + ¢ ¢ ¢ + bkxk and then upon observing a new y seeswhat x’s are consistent with that observation. This produces a con…dence setwith the desired con…dence level.

The only real di¢culties with the above general prescription are 1) the lack of

simple explicit formulas and 2) the fact that when ¾ is large (so that the regres-sion p MSE tends to be large) or the …tted relationship is very nonlinear, themethod can produce (completely rational but) unpleasant-looking con…dencesets. The …rst “problem” is really of limited consequence in a time when stan-dard statistical software will automatically produce plots of prediction limitsassociated with low order regressions. And the second matter is really inherentin the problem.

For the (simplest) linear version of this “inverse prediction” problem, thereis an approximate con…dence method in common use that doesn’t have thede…ciencies of the method (1.7). It is derived from a Taylor series argument andhas its own problems, but is nevertheless worth recording here for completenesssake. That is, under the k = 1 version of the model (1.6), commonly usedapproximate con…dence limits for xn+1 are (for x̂n+1 = (yn+1 ¡ b0)=b1 and¹x the sample mean of the gold standard measurements from the calibrationexperiment)

x̂n+1 § tp

MSE

jb1j

s 1 +

1

n +

(x̂n+1 ¡ ¹x)2Pni=1(xi ¡ ¹x)2

.

1.7 Crude Gaging and Statistics

All real-world measurement is “to the nearest something.” Often one may ignorethis fact, treat measured values as if they were “exact” and experience no realdi¢culty when using standard statistical methods (that are really based on anassumption that data are exact). However, sometimes in industrial applicationsgaging is “crude” enough that standard (e.g. “normal theory”) formulas give

nonsensical results. This section brie‡y considers what can be done to appro-priately model and draw inferences from crudely gaged data. The assumptionthroughout is that what are available are integer data, obtained by coding rawobservations via

integer observation = raw observation ¡ some reference value

smallest unit of measurement

8/21/2019 531notesv2 (SPC)

16/133


(the “smallest unit of measurement ” is “the nearest something” above).

1.7.1 Distributions of Sample Means and Ranges from In-teger Observations

To begin with something simple, note …rst that in situations where only a fewdi¤erent coded values are ever observed, rather than trying to model observa-tions with some continuous distribution (like a normal one) it may well makesense to simply employ a discrete pmf, say f , to describe any single measure-ment. In fact, suppose that a single (crudely gaged) observation Y has a pmf f (y) such that

f (y) = 0 unless y = 1; 2;:::;M :

Then if Y 1; Y 2; : : : ; Y n are iid with this marginal discrete distribution, one can

easily approximate the distribution of a function of these variables via simulation(using common statistical packages). And for two of the most common statisticsused in QC settings (the sample mean and range) one can even work out exactprobability distributions using computationally feasible and very elementarymethods.

To …nd the probability distribution of ¹Y in this context, one can build upthe probability distributions of sums of iid Y i’s recursively by “adding probabil-ities on diagonals in two-way joint probability tables.” For example the n = 2distribution of ¹Y can be obtained by making out a two-way table of joint prob-abilities for Y 1 and Y 2 and adding on diagonals to get probabilities for Y 1 + Y 2.Then making a two-way table of joint probabilities for (Y 1 + Y 2) and Y 3 onecan add on diagonals and …nd a joint distribution for Y 1 + Y 2 + Y 3. Or notingthat the distribution of Y 3 + Y 4 is the same as that for Y 1 + Y 2, it is possible to

make a two-way table of joint probabilities for (Y 1 + Y 2) and (Y 3 + Y 4), add ondiagonals and …nd the distribution of Y 1 + Y 2 + Y 3 + Y 4. And so on. (Clearly,after …nding the distribution for a sum, one simply divides possible values by nto get the corresponding distribution of ¹Y .)

To …nd the probability distribution of R = max Y i¡min Y i (for Y i’s as above)a feasible computational scheme is as follows. Let

S kj =

½ Pjx=k f (y) = P [k · Y · j] if k · j

0 otherwise

and compute and store these for 1 · k; j · M . Then de…ne

M kj = P [min Y i = k and max Y i = j] :

Now the event fminY i = k and max Y i = jg is the event fall observations arebetween k and j inclusiveg less the event fthe minimum is greater than k or themaximum is less than jg. Thus, it is straightforward to see that

M kj = (S kj)n ¡ (S k+1;j)n ¡ (S k;j¡1)n + (S k+1;j¡1)n

8/21/2019 531notesv2 (SPC)

17/133


and one may compute and store these values. Finally, note that

P [R = r] =M ¡rXk=1

M k;k+r :

These “algorithms” are good for any distribution f on the integers 1; 2; : : : ; M .Karen (Jensen) Hulting’s “DIST” program (available o¤ the Stat 531 Web page)automates the calculations of the distributions of ¹Y and R for certain f ’s re-lated to “integer rounding of normal observations.” (More on this rounding ideadirectly.)

1.7.2 Estimation Based on Integer-Rounded Normal Data

The problem of drawing inferences from crudely gaged data is one that has a

history of at least 100 years (if one takes a view that crude gaging essentially“rounds” “exact” values). Sheppard in the late 1800’s noted that if one roundsa continuous variable to integers, the variability in the distribution is typicallyincreased. He thus suggested not using the sample standard deviation (s) of rounded values but instead employing what is known as Sheppard’s correctionto arrive at r

(n ¡ 1)s2n

¡ 112

(1.8)

as a suitable estimate of “standard deviation” for integer-rounded data.The notion of “interval-censoring” of fundamentally continuous observations

provides a natural framework for the application of modern statistical theory tothe analysis of crudely gaged data. For univariate X with continuous cdf F (xjµ)depending upon some (possibly vector) parameter µ , consider X ¤ derived fromX by rounding to the nearest integer. Then the pmf of X ¤ is, say,

g(x¤jµ) :=½

F (x¤ + :5jµ) ¡ F (x¤ ¡ :5jµ) for x¤ an integer0 otherwise :

Rather than doing inference based on the unobservable variables X 1; X 2; : : : ; X nthat are iid F (xjµ), one might consider inference based on X ¤1 ; X ¤2 ; : : : ; X ¤n thatare iid with pmf g(x¤jµ).

The normal version of this scenario (the integer-rounded normal data model)makes use of

g(x¤

j¹; ¾)

:= 8

8/21/2019 531notesv2 (SPC)

18/133


“likelihood function” that treats the (joint) probability assigned to the vector(x¤1; x

¤2; : : : ; x

¤n) as a function of the parameters,

L(¹; ¾) :=Yi

g(x¤i j¹; ¾) =Yi

µ©

µx¤i + :5 ¡ ¹

¾

¶¡ ©

µx¤i ¡ :5 ¡ ¹

¾

¶¶ :

The log of this function of ¹ and ¾ is (naturally enough) called the loglikelihoodand will be denoted as

L(¹; ¾) := lnL(¹; ¾) :A sensible estimator of the parameter vector (¹; ¾) is “the point ( b¹; b¾) max-

imizing the loglikelihood.” This prescription for estimation is only partiallycomplete, depending upon the nature of the sample x¤1; x

¤2; : : : ; x

¤n. There are

three cases to consider, namely:

1. When the sample range of x¤1; x¤2; : : : ; xn is at least 2, L(¹; ¾) is well-behaved (nice and “mound-shaped”) and numerical maximization or justlooking at contour plots will quickly allow one to maximize the loglikeli-hood. (It is worth noting that in this circumstance, usually b¾ is close tothe “Sheppard corrected” value in display (1.8).)

2. When the sample range of x¤1; x¤2; : : : ; xn is 1, strictly speaking L(¹; ¾)

fails to achieve a maximum. However, with

m := #[x¤i = min x

¤i ] ;

(¹; ¾) pairs with ¾ small and

¹ ¼ min x¤i + :5 ¡ ¾©¡1 ³mn ´will have

L(¹; ¾) ¼ sup¹;¾

L(¹; ¾) = m ln m + (n ¡ m)ln(n ¡ m) ¡ n ln n :

That is, in this case one ought to “estimate” that ¾ is small and therelationship between ¹ and ¾ is such that a fraction m=n of the underlyingnormal distribution is to the left of minx¤i + :5, while a fraction 1 ¡ m=nis to the right.

3. When the sample range of x¤1; x¤2; : : : ; xn is 0, strictly speaking L(¹; ¾)

fails to achieve a maximum. However,

sup¹;¾

L(¹; ¾) = 0

and for any ¹ 2 (x¤1 ¡ :5; x¤1 + :5), L(¹; ¾) ! 0 as ¾ ! 0. That is, in thiscase one ought to “estimate” that ¾ is small and ¹ 2 (x¤1 ¡ :5; x¤1 + :5).

8/21/2019 531notesv2 (SPC)

19/133


Beyond the making of point estimates, the loglikelihood function can provideapproximate con…dence sets for the parameters ¹ and/or ¾. Standard “large

sample” statistical theory says that (for large n and Â2®:º the upper ® point of the Â2º distribution):

1. An approximate (1¡®) level con…dence set for the parameter vector (¹; ¾)is

f(¹; ¾)jL(¹; ¾) > sup¹;¾

L(¹; ¾) ¡ 12

Â2®:2g : (1.9)

2. An approximate (1 ¡ ®) level con…dence set for the parameter ¹ is

f¹j sup¾

L(¹; ¾) > sup¹;¾

L(¹; ¾) ¡ 12

Â2®:1g : (1.10)

3. An approximate (1 ¡ ®) level con…dence set for the parameter ¾ is

f¾j sup¹

L(¹; ¾) > sup¹;¾

L(¹; ¾) ¡ 12

Â2®:1g : (1.11)

Several comments and a fuller discussion are in order regarding these con-…dence sets. In the …rst place, Karen (Jensen) Hulting’s CONEST program(available o¤ the Stat 531 Web page) is useful in …nding sup

¹;¾L(¹; ¾) and pro-

ducing rough contour plots of the (joint) sets for (¹; ¾) in display (1.9). Second,it is common to call the function of ¹ de…ned by

L¤(¹) = sup

¾ L(¹; ¾)

the “pro…le loglikelihood” function for ¹ and the function of ¾

L¤¤(¾) = sup¹

L(¹; ¾)

the “pro…le loglikelihood” function for ¾. Note that display (1.10) then says thatthe con…dence set should consist of those ¹’s for which the pro…le loglikelihoodis not too much smaller than the maximum achievable. And something entirelyanalogous holds for the sets in (1.11). Johnson Lee (in 2001 Ph.D. dissertationwork) has carefully studied these con…dence interval estimation problems anddetermined that some modi…cation of methods (1.10) and (1.11) is necessary inorder to provide guaranteed coverage probabilities for small sample sizes. (It

is also very important to realize that contrary to naive expectations, not evena large sample size will make the usual t-intervals for ¹ and Â2-intervals for ¾hold their nominal con…dence levels in the event that ¾ is small, i.e. that therounding or crudeness of the gaging is important. Ignoring the rounding whenit is important can produce actual con…dence levels near 0 for methods withlarge nominal con…dence levels.)

8/21/2019 531notesv2 (SPC)

20/133


Table 1.2: ¢ for 0-Range Samples Based on Very Small n

®n :05 :10 :202 3:084 1:547 :7853 :776 :5624 :517

Intervals for a Normal Mean Based on Integer-Rounded Data

Speci…cally regarding the sets for ¹ in display (1.10), Lee (in work to appear inthe Journal of Quality Technology ) has shown that one must replace the valueÂ2®:1 with something larger in order to get small n actual con…dence levels nottoo far from nominal for “most” (¹; ¾). In fact, the choice

c(n; ®) = n lnÃt2®

2 :(n¡1)n ¡ 1 + 1

!(for t ®

2 :(n¡1) the upper ®2 point of the t distribution with º = n ¡ 1 degrees of

freedom) is appropriate.After replacing Â2®:1 with c(n; ®) in display (1.10) there remains the numer-

ical analysis problem of actually …nding the interval prescribed by the display.The nature of the numerical analysis required depends upon the sample rangeencountered in the crudely gaged data. Provided the range is at least 2, L¤(¹)is well-behaved (continuous and “mound-shaped”) and even simple trial anderror with Karen (Jensen) Hulting’s CONEST program will quickly producethe necessary interval. When the range is 0 or 1, L¤(¹) has respectively 2 or 1discontinuities and the numerical analysis is a bit trickier. Lee has recorded theresults of the numerical analysis for small sample sizes and ® = :05; :10 and :20(con…dence levels respectively 95%; 90% and 80%).

When a sample of size n produces range 0 with, say, all observations equalto x¤, the intuition that one ought to estimate ¹ 2 (x¤ ¡ :5; x¤ + :5) is soundunless n is very small. If n and ® are as recorded in Table 1.2 then display(1.10) (modi…ed by the use of c(n; ®) in place of Â2®:1) leads to the interval(x¤ ¡ ¢; x¤ + ¢). (Otherwise it leads to (x¤ ¡ :5; x¤ + :5) for these ®.)

In the case that a sample of size n produces range 1 with, say, all observationsx¤ or x¤+1, the interval prescribed by display (1.10) (with c(n; ®) used in placeof Â2®:1) can be thought of as having the form (x

¤+ :5¡¢L; x¤+ :5+¢U ) where¢L and ¢U depend upon

nx¤ = #[observations x¤] and nx¤+1 = #[observations x¤ + 1] . (1.12)

When nx¤ ¸ nx¤+1, it is the case that ¢L ¸ ¢U . And when nx¤ · nx¤+1,correspondingly ¢L · ¢U . Let

m = maxfnx¤ ; nx¤+1g (1.13)

8/21/2019 531notesv2 (SPC)

21/133


Table 1.3: (¢1;¢2) for Range 1 Samples Based on Small n

®n m :05 :10 :20

2 1 (6:147; 6:147) (3:053; 3:053) (1:485; 1:485)3 2 (1:552; 1:219) (1:104; 0:771) (0:765; 0:433)4 3 (1:025; 0:526) (0:082; 0:323) (0:639; 0:149)

2 (0:880; 0:880) (0:646; 0:646) (0:441; 0:441)5 4 (0:853; 0:257) (0:721; 0:132) (0:592; 0:024)

3 (0:748; 0:548) (0:592; 0:339) (0:443; 0:248)6 5 (0:772; 0:116) (0:673; 0:032) (0:569; 0:000)

4 (0:680; 0:349) (0:562; 0:235) (0:444; 0:126)3 (0:543; 0:543) (0:420; 0:420) (0:299; 0:299)

7 6 (0:726; 0:035) (0:645; 0:000) (0:556; 0:000)5 (0:640; 0:218) (0:545; 0:130) (0:446; 0:046)4 (0:534; 0:393) (0:432; 0:293) (0:329; 0:193)

8 7 (0:698; 0:000) (0:626; 0:000) (0:547; 0:000)6 (0:616; 0:129) (0:534; 0:058) (0:446; 0:000)5 (0:527; 0:281) (0:439; 0:197) (0:347; 0:113)4 (0:416; 0:416) (0:327; 0:327) (0:236; 0:236)

9 8 (0:677; 0:000) (0:613; 0:000) (0:541; 0:000)7 (0:599; 0:065) (0:526; 0:010) (0:448; 0:000)6 (0:521; 0:196) (0:443; 0:124) (0:361; 0:054)5 (0:429; 0:321) (0:350; 0:242) (0:267; 0:163)

10 9 (0:662; 0:000) (0:604; 0:000) (0:537; 0:000)8 (0:587; 0:020) (0:521; 0:000) (0:450; 0:000)7 (0:515; 0:129) (0:446; 0:069) (0:371; 0:012)

6 (0:437; 0:242) (0:365; 0:174) (0:289; 0:105)5 (0:346; 0:346) (0:275; 0:275) (0:200; 0:200)

and correspondingly take

¢1 = maxf¢L; ¢U g and ¢2 = minf¢L; ¢U g .

Table 1.3 then gives values for ¢1 and ¢2 for n · 10 and ® = :05; :10 and :2.

Intervals for a Normal Standard Deviation Based on Integer-RoundedData

Speci…cally regarding the sets for ¾ in display (1.11), Lee found that in orderto get small n actual con…dence levels not too far from nominal, one must notonly replace the value Â2®:1 with something larger, but must make an additionaladjustment for samples with ranges 0 and 1.

Consider …rst replacing Â2®:1 in display (1.11) with a (larger) value d(n; ®)given in Table 1.4. Lee found that for those (¹; ¾) with moderate to large ¾,

8/21/2019 531notesv2 (SPC)

22/133


Table 1.4: d(n; ®) for Use in Estimating ¾

®n :05 :102 10:47 7:713 7:26 5:234 6:15 4:395 5:58 3:976 5:24 3:717 5:01 3:548 4:84 3:429 4:72 3:33

10 4:62 3:2615 4:34 3:0620 4:21 2:97

30 4:08 2:881 3:84 2:71

making this d(n; ®) for Â2®:1 substitution is enough to produce an actual con-…dence level approximating the nominal one. However, even this modi…cationis not adequate to produce an acceptable coverage probability for (¹; ¾) withsmall ¾ .

For samples with range 0 or 1, formula (1.11) prescribes intervals of the form(0; U ). And reasoning that when ¾ is small, samples will typically have range0 or 1, Lee was able to …nd (larger) replacements for the limit U prescribed by(1.11) so that the resulting estimation method has actual con…dence level notmuch below the nominal level for any (¹; ¾) (with ¾ large or small).

That is if a 0-range sample is observed, estimate ¾ by

(0; ¤0)

where ¤0 is taken from Table 1.5. If a range 1 sample is observed consisting,say, of values x¤ and x¤+1, and nx¤ ; nx¤+1 and m are as in displays (1.12) and(1.13), estimate ¾ using

(0; ¤1;m)

where ¤1;m is taken from Table 1.6.The use of these values ¤0 for range 0 samples, and ¤1;m for range 1 samples,

and the values d(n; ®) in place of Â2®:1 in display (1.11) …nally produces a reliablemethod of con…dence interval estimation for ¾ when normal data are integer-rounded.

8/21/2019 531notesv2 (SPC)

23/133


Table 1.5: ¤0 for Use in Estimating ¾®

n :05 :102 5:635 2:8073 1:325 0:9164 0:822 0:6535 0:666 0:5586 0:586 0:5027 0:533 0:4648 0:495 0:4359 0:466 0:413

10 0:443 0:39611 0:425 0:381

12 0:409 0:36913 0:396 0:35814 0:384 0:34915 0:374 0:341

8/21/2019 531notesv2 (SPC)

24/133


Table 1.6: ¤1;m for Use in Estimating ¾ (m in Parentheses)®

n :05 :102 16:914(1) 8:439(1)3 3:535(2) 2:462(2)

4 1:699(3) 2:034(2) 1:303(3) 1:571(2)5 1:143(4) 1:516(3) 0:921(4) 1:231(3)6 0:897(5) 1:153(4) 1:285(3) 0:752(5) 0:960(4) 1:054(3)7 0:768(6) 0:944(5) 1:106(4) 0:660(6) 0:800(5) 0:949(4)8 0:687(7) 0:819(6) 0:952(5) 0:599(7) 0:707(6) 0:825(5)

1:009(4) 0:880(4)9 0:629(8) 0:736(7) 0:837(6) 0:555(8) 0:644(7) 0:726(6)

0:941(5) 0:831(5)10 0:585(9) 0:677(8) 0:747(7) 0:520(9) 0:597(8) 0:654(7)

0:851(6) 0:890(5) 0:753(6) 0:793(5)11 0:550(10) 0:630(9) 0:690(8) 0:493(10) 0:560(9) 0:609(8)

0:775(7) 0:851(6) 0:685(7) 0:763(6)12 0:522(11) 0:593(10) 0:646(9) 0:470(11) 0:531(10) 0:573(9)

0:708(8) 0:789(7) 0:818(6) 0:626(8) 0:707(7) 0:738(6)13 0:499(12) 0:563(11) 0:610(10) 0:452(12) 0:506(11) 0:544(10)

0:658(9) 0:733(8) 0:791(7) 0:587(9) 0:655(8) 0:716(7)14 0:479(13) 0:537(12) 0:580(11) 0:436(13) 0:485(12) 0:520(11)

0:622(10) 0:681(9) 0:745(8) 0:558(10) 0:607(9) 0:674(8)0:768(7) 0:698(7)

15 0:463(14) 0:515(13) 0:555(12) 0:422(14) 0:468(13) 0:499(12)0:593(11) 0:639(10) 0:701(9) 0:534(11) 0:574(10) 0:632(9)0:748(8) 0:682(8)

8/21/2019 531notesv2 (SPC)

25/133

Chapter 2

Process Monitoring

Chapters 3 and 4 of V&J discuss methods for process monitoring. The keyconcept there regarding the probabilistic description of monitoring schemes isthe run length idea introduced on page 91 and speci…cally in display (3.44).Theory for describing run lengths is given in V&J only for the very simplest caseof geometrically distributed T . This chapter presents some more general toolsfor the analysis/comparison of run length distributions of monitoring schemes,namely discrete time …nite state Markov chains and recursions expressed interms of integral (and di¤erence) equations.

2.1 Some Theory for Stationary Discrete TimeFinite State Markov Chains With a Single

Absorbing StateThese are probability models for random systems that at times t = 1; 2; 3 : : :can be in one of a …nite number of states

S1; S2; : : : ; Sm; Sm+1 :

The “Markov” assumption is that the conditional distribution of where thesystem is at time t +1 given the entire history of where it has been up throughtime t only depends upon where it is at time t. (In colloquial terms: Theconditional distribution of where I’ll be tomorrow given where I am and how I gothere depends only on where I am, not on how I got here.) So called “stationary”Markov Chain (MC) models employ the assumption that movement betweenstates from any time t to time t + 1 is governed by a (single) matrix of (one-

step) “transition probabilities” (that is independent of t)

P (m+1)£(m+1)

= ( pij)

where

pij = P [system is in Sj at time t + 1 j system is in Si at time t] :

21

8/21/2019 531notesv2 (SPC)

26/133

22 CHAPTER 2. PROCESS MONITORING

S S

S

1 2

3

.1 .05

.8 .05

1.0

.1

.9

Figure 2.1: Schematic for a MC with Transition Matrix (2.1)

As a simple example of this, consider the transition matrix

P 3£3

:=

0@ :8 :1 :1:9 :05 :050 0 1

1A : (2.1)Figure 2.1 is a useful schematic representation of this model.

The Markov Chain represented by Figure 2.1 has an interesting property.

That is, while it is possible to move back and forth between states 1 and 2,once the system enters state 3, it is “stuck” there. The standard jargon for thisproperty is to say that S3 is an absorbing state . (In general, if pii = 1, Si iscalled an absorbing state.)

Of particular interest in applications of MCs to the description of processmonitoring schemes are chains with a single absorbing state, say Sm+1, where itis possible to move (at least eventually) from any other state to the absorbingstate. One thing that makes these chains so useful is that it is very easy towrite down a matrix formula for a vector giving the mean number of transitionsrequired to reach Sm+1 from any of the other states. That is, with

Li = the mean number of transitions required to move from Si to Sm+1 ;

Lm£1

= 0BBB@L1

L2...

Lm

1CCCA ; P (m+1)£(m+1) = 0@ Rm£m r m£101£m 11£1

1A ; and 1m£1

= 0BBB@1

1...1

1CCCAit is the case that

L = (I ¡R)¡11 : (2.2)

8/21/2019 531notesv2 (SPC)

27/133

2.1. SOME THEORY FOR STATIONARY DISCRETE TIME FINITE STATE MARKOV CHAINS WITH

To argue that display (2.2) is correct, note that the following system of mequations “clearly” holds:

L1 = (1 + L1) p11 + (1 + L2) p12 + ¢ ¢ ¢ + (1 + Lm) p1m + 1 ¢ p1;m+1L2 = (1 + L1) p21 + (1 + L2) p22 + ¢ ¢ ¢ + (1 + Lm) p2m + 1 ¢ p2;m+1

...

Lm = (1 + L1) pm1 + (1 + L2) pm2 + ¢ ¢ ¢ + (1 + Lm) pmm + 1 ¢ pm;m+1 :But this set is equivalent to the set

L1 = 1 + p11L1 + p12L2 + ¢ ¢ ¢ + p1mLmL2 = 1 + p21L1 + p22L2 + ¢ ¢ ¢ + p2mLm

...

Lm = 1 + pm1L1 + pm2L2 +¢ ¢ ¢

+ pmmLm

and in matrix notation, this second set of equations is

L = 1 +RL : (2.3)

SoL¡RL = 1 ;

i.e.(I ¡R)L = 1 :

Under the conditions of the present discussion it is the case that (I ¡ R) isguaranteed to be nonsingular, so that multiplying both sides of this matrixequation by the inverse of (I ¡R) one …nally has equation (2.2).

For the simple 3-state example with transition matrix (2.1) it is easy enoughto verify that with

R =µ :8 :1

:9 :05¶

one has

(I ¡R)¡11 =µ

10:511

¶ :

That is, the mean number of transitions required for absorption (into S3) fromS1 is 10:5 while the mean number required from S2 is 11:0.

When one is working with numerical values in P and thus wants numericalvalues in L, the matrix formula (2.2) is most convenient for use with numericalanalysis software. When, on the other hand, one has some algebraic expressionsfor the pij and wants algebraic expressions for the Li, it is usually most e¤ectiveto write out the system of equations represented by display (2.3) and to try and

see some slick way of solving for an Li of interest.It is also worth noting that while the discussion in this section has centered

on the computation of mean times to absorption, other properties of “time toabsorption” variables can be derived and expressed in matrix notation. Forexample, Problem 2.22 shows that it is fairly easy to …nd the variance (orstandard deviation) of time to absorption variables.

8/21/2019 531notesv2 (SPC)

28/133


2.2 Some Applications of Markov Chains to the

Analysis of Process Monitoring SchemesWhen the “current condition” of a process monitoring scheme can be thoughtof as discrete random variable (with a …nite number of possible values), because

1. the variables Q1; Q2; ::. fed into it are intrinsically discrete (for examplerepresenting counts) and are therefore naturally modeled using a discreteprobability distribution (and the calculations prescribed by the schemeproduce only a …xed number of possible outcomes),

2. “discretization” of the Q’s has taken place as a part of the developmentof the monitoring scheme (as, for example, in the “zone test” schemesoutlined in Tables 3.5 through 3.7 of V&J), or

3. one approximates continuous distributions for Q’s and/or states of thescheme with a “…nely-discretized” version in order to approximate exact(continuous) run length properties,

one can often apply the material of the previous section to the prediction of scheme behavior. (This is possible when the evolution of the monitoring schemecan be thought of in terms of movement between “states” where the conditionaldistribution of the next “state” depends only on a distribution for the next Qwhich itself depends only on the current “state” of the scheme.) This sectioncontains four examples of what can be done in this direction.

As an initial simple example, consider the simple monitoring scheme (sug-gested in the book Sampling Inspection and Quality Control by Wetherill) thatsignals an alarm the …rst time

1. a single point Q plots “outside 3 sigma limits,” or

2. two consecutive Q’s plot “between 2 and 3 sigma limits.”

(This is a simple competitor to the sets of alarm rules speci…ed in Tables 3.5through 3.7 of V&J.) Suppose that one assumes that Q1; Q2; : : : are iid and

q 1 = P [Q1 plots outside 3 sigma limits]

andq 2 = P [Q1 plots between 2 and 3 sigma limits] :

Then one might think of describing the evolution of the monitoring scheme with

a 3-state MC with states

S1 = “all is OK,”

S2 = “no alarm yet and the current Q is between 2 and 3 sigma limits,” and

S3 = “alarm.”

8/21/2019 531notesv2 (SPC)

29/133

2.2. SOME APPLICATIONS OF MARKOV CHAINS TO THE ANALYSIS OF PROCESS MONITORING

q + q1 2

S3

S2S1

1- q - q1 2

q2

1- q - q1 2

q1

1.0

0

Figure 2.2: Schematic for a MC with Transition Matrix (2.4)

For this representation, an appropriate transition matrix is

P =

0@ 1 ¡ q 1 ¡ q 2 q 2 q 11 ¡ q 1 ¡ q 2 0 q 1 + q 20 0 1

1A (2.4)and the ARL of the scheme (under the iid model for the Q sequence) is L1, themean time to absorption into the alarm state from the “all-OK” state. Figure

2.2 is a schematic representation of this scenario.It is worth noting that a system of equations for L1 and L2 is

L1 = 1 ¢ q 1 + (1 + L2)q 2 + (1 + L1)(1 ¡ q 1 ¡ q 2)L2 = 1 ¢ (q 1 + q 2) + (1 + L1)(1 ¡ q 1 ¡ q 2) ;

which is equivalent to

L1 = 1 + L1 ¢ (1 ¡ q 1 ¡ q 2) + L2q 2L2 = 1 + L1(1 ¡ q 1 ¡ q 2) ;

which is the “non-matrix version” of the system (2.3) for this example. It iseasy enough to verify that this system of two linear equations in the unknowns

L1 and L2 has a (simultaneous) solution with

L1 = 1 + q 2

1 ¡ (1 ¡ q 1 ¡ q 2) ¡ q 2(1 ¡ q 1 ¡ q 2) :

As a second application of MC technology to the analysis of a process moni-toring scheme, we will consider a so-called “Run-Sum” scheme. To de…ne such a

8/21/2019 531notesv2 (SPC)

30/133


scheme, one begins with “zones” for the variable Q as indicated in Figure 3.9 of V&J. Then “scores” are de…ned for various possible values of Q. For j = 0; 1; 2

a score of + j is assigned to the eventuality that Q is in the “positive j-sigma to( j + 1)-sigma zone,” while a score of ¡ j is assigned to the eventuality that Qis in the “negative j-sigma to ( j + 1)-sigma zone.” A score of +3 is assigned toany Q above the “upper 3-sigma limit” while a score of ¡3 is assigned to any Qbelow the “lower 3-sigma limit.” Then, for the variables Q1; Q2; : : : one de…nescorresponding scores Q¤1; Q

¤2; : : : and “run sums” R1; R2; : : : where

Ri = “the ‘sum’ of scores Q¤ through time i under the provision that a

new sum is begun whenever a score is observed with a sign di¤erent

from the existing Run-Sum.”

(Note, for example, that a new score of Q¤ = +0 will reset a current Run-Sumof R =

¡2 to +0.) The Run-Sum scheme then signals at the …rst i for which

jQ¤i j = 3 or jRij ¸ 4.Then de…ne states for a Run-Sum process monitoring scheme

S1 = “no alarm yet and R = ¡0,”S2 = “no alarm yet and R = ¡1,”S3 = “no alarm yet and R = ¡2,”S4 = “no alarm yet and R = ¡3,”S5 = “no alarm yet and R = +0,”

S6 = “no alarm yet and R = +1,”

S7 = “no alarm yet and R = +2,”

S8 = “no alarm yet and R = +3,” and

S9 = “alarm.”

If one assumes that the observations Q1; Q2; : : : are iid and for j = ¡3; ¡2; ¡1; ¡0;+0; +1; +2; +3 lets

q j = P [Q¤1 = j] ;

an appropriate transition matrix for describing the evolution of the scheme is

P =

0BBBBBBBBBBBB@

q ¡0 q ¡1 q ¡2 0 q +0 q +1 q +2 0 q ¡3 + q +30 q ¡0 q ¡1 q ¡2 q +0 q +1 q +2 0 q ¡3 + q +30 0 q ¡0 q ¡1 q +0 q +1 q +2 0 q ¡3 +q ¡2 + q +30 0 0 q ¡0 q +0 q +1 q +2 0 q ¡3 + q ¡2 + q ¡1 +q +1

q ¡0 q ¡1 q ¡2 0 q +0 q +1 q +2 0 q ¡3 + q +3q ¡

0 q ¡

1 q ¡

2 0 0 q +0 q +1 q +2 q ¡

3 + q +3

q ¡0 q ¡1 q ¡2 0 0 0 q +0 q +1 q ¡3 +q +2 +q +3q ¡0 q ¡1 q ¡2 0 0 0 0 q +0 q ¡3 +q +1 + q +2 +q +30 0 0 0 0 0 0 0 1

1CCCCCCCCCCCCA

and the ARL for the scheme is L1 = L5. (The fact that the 1st and 5th rows of P are identical makes it clear that the mean times to absorption from S 1 and S5

8/21/2019 531notesv2 (SPC)

31/133

2.2. SOME APPLICATIONS OF MARKOV CHAINS TO THE ANALYSIS OF PROCESS MONITORING

q-1

q0

q1

q2

qm

qm-1

q-m

h-h 0 2h/m-h/m h/m

*(y)

... ...

Figure 2.3: Notational Conventions for Probabilities from Rounding Q ¡ k1Values

must be the same.) It turns out that clever manipulation with the “non-matrix”version of display (2.3) in this example even produces a fairly simple expressionfor the scheme’s ARL. (See Problem 2.24 and Reynolds (1971 JQT ) and thereferences therein in this …nal regard.)

To turn to a di¤erent type of application of the MC technology, considerthe analysis of a high side decision interval CUSUM scheme as described in§4.2 of V&J. Suppose that the variables Q1; Q2; : : : are iid with a continuousdistribution speci…ed by the probability density f (y). Then the variables Q1 ¡k1; Q2 ¡k1; Q3 ¡k1; : : : are iid with probability density f ¤(y) = f (y +k1). For apositive integer m, we will think of replacing the variables Qi ¡k1 with versionsof them rounded to the nearest multiple of h=m before CUSUMing. Then theCUSUM scheme can be thought of in terms of a MC with states

Si = “no alarm yet and the current CUSUM is (i ¡ 1)µ

h

m

¶”

for i = 1; 2; : : : ; m and

Sm+1 = “alarm.”

Then let

q ¡m =Z ¡h+12( hm)¡1

f ¤(y)dy = P [Q1 ¡ k1 · ¡h + 12

µh

m

¶] ;

q m =

Z 1h¡12( hm)

f ¤(y)dy = P [h ¡ 12

µh

m

¶< Q1 ¡ k1] ;

and for ¡m < j < m take

q j = Z j( hm)+12( hm)

j( hm)¡ 12( hm)f ¤(y)dy : (2.5)

These notational conventions for probabilities q ¡m; : : : ; q m are illustrated inFigure 2.3.

In this notation, the evolution of the high side decision interval CUSUMscheme can then be described in approximate terms by a MC with transition

8/21/2019 531notesv2 (SPC)

32/133


matrix

P (m+1)£(m+1)

=

0BBBBBBBBBBBBBBBBBBBB@

0Xj=¡m

q j q 1 q 2 ¢ ¢ ¢ q m¡1 q m¡1X

j=¡mq j q 0 q 1 ¢ ¢ ¢ q m¡2 q m¡1 +q m

¡2Xj=¡m

q j q ¡1 q 0 ¢ ¢ ¢ q m¡3 q m¡2 + q m¡1 + q m...

......

......

q ¡m +q ¡m+1 q ¡m+2 q ¡m+3 ¢ ¢ ¢ q 0mXj=1

q j

0 0 0 ¢ ¢ ¢ 0 1

1CCCCCCCCCCCCCCCCCCCCA

:

For i = 1; : : : ; m the mean time to absorption from state Si (Li) is approximatelythe ARL of the scheme with head start (i ¡ 1)¡ h

m

¢. (That is, the entries of the

vector L speci…ed in display (2.2) are approximate ARL values for the CUSUMscheme using various possible head starts.) In practice, in order to …nd ARLsfor the original scheme with non-rounded iid observations Q, one would …ndapproximate ARL values for an increasing sequence of m’s until those appearto converge for the head start of interest.

As a …nal example of the use of MC techniques in the probability modelingof process monitoring scheme behavior, consider discrete approximation of theEWMA schemes of §4.1 of V&J where the variables Q1; Q2; : : : are again iidwith continuous distribution speci…ed by a pdf f (y). In this case, in order to

provide a tractable discrete approximation, it will not typically su¢ce to simplydiscretize the variables Q (as the EWMA calculations will then typically producea number of possible/exact EWMA values that grows as time goes on). Instead,it is necessary to think directly in terms of rounded/discretized EWMAs. So foran odd positive integer m, let ¢ = (U CLEWMA¡LCLEWMA)=m and think of replacing an (exact) EWMA sequence with a rounded EWMA sequence takingon values ai de…ned by

ai:= LCLEWMA +

¢

2 + (i ¡ 1)¢

for i = 1; 2; : : : ; m. For i = 1; 2;:::;m let

Si = “no alarm yet and the rounded EWMA is ai”

and

Sm+1 = “alarm.”

8/21/2019 531notesv2 (SPC)

33/133

2.3. INTEGRAL EQUATIONS AND RUN LENGTH PROPERTIES OF PROCESS MONITORING SCHEM

And for 1 · i; j · m, let

q ij = P [moving from Si to Sj] ;

= P [aj ¡ ¢2 · (1 ¡ ¸)ai + ¸Q · aj + ¢

2 ] ;

= P [aj ¡ (1 ¡ ¸)ai

¸ ¡ ¢

2¸ · Q · aj ¡ (1 ¡ ¸)ai

¸ +

¢

2¸] ;

= P [ai + ( j ¡ i)¢

¸ ¡ ¢

2¸ · Q · ai + ( j ¡ i)¢

¸ +

¢

2¸] ;

=

Z ai+ (j¡i)¢¸ + ¢2¸ai+

(j¡i)¢¸ ¡ ¢2¸

f (y)dy : (2.6)

Then with

P =

0BBBBBBBBBBBBBB@

q 11 q 12 ¢ ¢ ¢

q 1m 1

¡

m

Xj=1 q 1jq 21 q 22 ¢ ¢ ¢ q 2m 1 ¡

mXj=1

q 2j

......

......

q m1 q m2 ¢ ¢ ¢ q mm 1 ¡mXj=1

q mj

0 0 ¢ ¢ ¢ 0 1

1CCCCCCCCCCCCCCAthe mean time to absorption from the state S(m+1)=2 (the value L(m+1)=2) of a MC with this transition matrix is an approximation for the EWMA schemeARL with EWMA0 = (UCLEWMA + LCLEWMA)=2. In practice, in order to…nd the ARL for the original scheme, one would …nd approximate ARL valuesfor an increasing sequence of m’s until those appear to converge.

The four examples in this section have illustrated the use of MC calculationsin the second and third of the two circumstances listed at the beginning of thissection. The …rst circumstance is conceptually the simplest of the three, and isfor example illustrated by Problems 2.25, 2.28 and 2.37. The examples have alsoall dealt with iid models for the Q1; Q2; : : : sequence. Problem 2.26 shows thatthe methodology can also easily accommodate some kinds of dependencies inthe Q sequence. (The discrete model in Problem 2.26 is itself perhaps less thancompletely appealing, but the reader should consider the possibility of discreteapproximation of the kind of dependency structure employed in Problem 2.27before dismissing the basic concept illustrated in Problem 2.26 as useless.)

2.3 Integral Equations and Run Length Proper-ties of Process Monitoring Schemes

There is a second (and at …rst appearance quite di¤erent) standard method of approaching the analysis of the run length behavior of some process monitoring

8/21/2019 531notesv2 (SPC)

34/133


schemes where continuous variables Q are involved. That is through the use of integral equations, and this section introduces the use of these. (As it turns out,

by the time one is forced to …nd numerical solutions of the integral equations,there is not a whole lot of di¤erence between the methods of this section andthose of the previous one. But it is important to introduce this second point of view and note the correspondence between approaches.)

Before going to the details of speci…c schemes and integral equations, a smallpiece of calculus/numerical analysis needs to be reviewed and notation set foruse in these notes. That concerns the approximation of de…nite integrals on theinterval [a; a + h]. Speci…cation of a set of points

a · a1 · a2 · ¢ ¢ ¢ · am · a + hand weights

wi ¸ 0 withm

Xi=1 wi = hso that Z a+h

a

f (y)dy may be approximated asmXi=1

wif (ai)

for “reasonable” functions f (y), is the speci…cation of a so-called “quadraturerule” for approximating integrals on the interval [a; a+h]. The simplest of suchrules is probably the choice

ai:= a +

µi ¡ 12

m

¶h with wi

:=

h

m : (2.7)

(This choice amounts to approximating an integral of f by a sum of signed areasof rectangles with bases h=m and (signed) heights chosen as the values of f atmidpoints of intervals of length h=m beginning at a.)

Now consider a high side CUSUM scheme as in §4.2 of V&J, where Q1; Q2; : : :are iid with continuous marginal distribution speci…ed by the probability densityf (y). De…ne the function

L1(u) := the ARL of the high side CUSUM scheme using a head start of u :

If one begins CUSUMing at u, there are three possibilities of where he/she will beafter a single observation, Q1. If Q1 is large (Q1 ¡k1 ¸ h¡u) then there will bean immediate signal and the run length will be 1. If Q1 is small (Q1 ¡k1 · ¡u)the CUSUM will “zero out,” one observation will have been “spent,” and onaverage L1(0) more observations are to be faced in order to produce a signal.Finally, if Q1 is moderate (¡u < Q1 ¡ k1 < h ¡ u) then one observation willhave been spent and the CUSUM will continue from u +(Q1 ¡k1), requiring onaverage an additional L1(u + (Q1 ¡ k1)) observations to produce a signal. Thisreasoning leads to the equation for L1,

L1(u) = 1 ¢ P [Q1 ¡ k1 ¸ h ¡ u] + (1 + L1(0))P [Q1 ¡ k1 · ¡u]

+

Z k1+h¡uk1¡u

(1 + L1(u + y ¡ k1))f (y)dy :

8/21/2019 531notesv2 (SPC)

35/133


Writing F (y) for the cdf of Q1 and simplifying slightly, this is

L1(u) = 1 + L1(0)F (k1 ¡ u) +Z h

0

L1(y)f (y + k1 ¡ u)dy : (2.8)

The argument leading to equation (2.8) has a twin that produces an integralequation for

L2(v) := the ARL of a low side CUSUM scheme using a head start of v :

That equation is

L2(v) = 1 + L2(0)(1 ¡ F (k2 ¡ u)) + Z 0

¡h L2(y)f (y + k2 ¡ v)dy : (2.9)

And as indicated in display (4.20) of V&J, could one solve equations (2.8) and(2.9) (and thus obtain L1(0) and L2(0)) one would have not only separate highand low side CUSUM ARLs, but ARLs for some combined schemes as well.(Actually, more than what is stated in V&J can be proved. Yashchin in aJournal of Applied Probability paper in about 1985 showed that with iid Q’s,high side decision interval h1 and low side decision interval ¡h2 for nonnegativeh2, if k1 ¸ k2 and

(k1 ¡ k2) ¡ jh1 ¡ h1j ¸ max (0; u ¡ v ¡ max(h1; h2)) ;

for the simultaneous use of high and low side schemes

ARLcombined = L1(0)L2(v) + L1(u)L2(0) ¡ L1(0)L2(0)

L1(0) + L2(0) :

It is easily veri…ed that what is stated on page 151 of V&J is a special case of this result.) So in theory, to …nd ARLs for CUSUM schemes one need “only”solve the integral equations (2.8) and (2.9). This is easier said than done. Theone case where fairly explicit solutions are known is that where observations areexponentially distributed (see Problem 2.30). In other cases one must resort tonumerical solution of the integral equations.

So consider the problem of approximate solution of equation (2.8). For

a particular quadrature rule for integrals on [0; h], for each ai one has fromequation (2.8) the approximation

L1(ai) ¼ 1 + L1(a1)F (k1 ¡ ai) +mXj=1

wjL1(aj)f (aj + k1 ¡ ai) :

8/21/2019 531notesv2 (SPC)

36/133


That is, at least approximately one has the system of m linear equations

L1(a1) = 1 + L1(a1)[F (k1 ¡ a1) + w1f (k1)] +mXj=2

L1(aj)wjf (aj + k1 ¡ a1) ;

L1(a2) = 1 + L1(a1)[F (k1 ¡ a2) + w1f (a1 +k1 ¡ a2)] +mXj=2

L1(aj)wjf (aj + k1 ¡ a2) ;

...

L1(am) = 1 + L1(a1)[F (k1 ¡ am) + w1f (a1 + k1 ¡ am)] +mXj=2

L1(aj)wjf (aj + k1 ¡ am)

in the m unknowns L1(a1); : : : ; L1(am). Again in light of equation (2.8) and thenotion of numerical approximation of de…nite integrals, upon solving this set of

equations (for approximate values of (L1(a1); : : : ; L1(am)) one may approximatethe function L1(u) as

L1(u) ¼ 1 + L1(a1)F (k1 ¡ u) +mXj=1

wjL1(aj)f (aj + k1 ¡ u) :

It is a revealing point that the system of equations above is of the form (2.3)that was so useful in the MC approach to the determination of ARLs. That is,let

L =

0BBB@

L1(a1)L1(a2)

...L1(am)

1CCCAand

R =

0BBB@F (k1 ¡ a1) + w1f (k1) w2f (a2 +k1 ¡ a1) ¢ ¢ ¢ wmf (am + k1 ¡ a1)

F (k1 ¡ a2) + w1f (a1 + k1 ¡ a2) w2f (k1) ¢ ¢ ¢ wmf (am + k1 ¡ a2)...

......

F (k1 ¡ am) + w1f (a1 + k1 ¡ am) w2f (a2 + k1 ¡ am) ¢ ¢ ¢ wmf (k1)

1CCCAand note that the set of equations for the “ai head start approximate ARLs” isexactly of the form (2.3). With the simple quadrature rule in display (2.7) notethat a generic entry of R; rij, for j ¸ 2 is

rij = wjf (aj + k1 ¡ ai) =µ

h

m¶f

µ( j ¡ i)

µh

m¶+ k1

¶ :

But using again the notation f ¤(y) = f (y+k1) employed in the CUSUM exampleof §2.2, this means

rij =

µh

m

¶f ¤

µ( j ¡ i)

µh

m

¶¶¼Z (j¡i)( hm)+12( hm)

(j¡i)( hm)¡ 12( hm)f ¤(y)dy = q j¡i

8/21/2019 531notesv2 (SPC)

37/133


(in terms of the notation (2.5) from the CUSUM example). The point is thatwhether one begins from a “discretize the Q

¡k1 distribution and employ the

MC material” point of view or from a “do numerical solution of an integralequation” point of view is largely immaterial. Very similar large systems of linear equations must be solved in order to …nd approximate ARLs.

As a second application of integral equation ideas to the analysis of processmonitoring schemes, consider the EWMA schemes of §4.1 of V&J where Q1; Q2; : : :are iid with a continuous distribution speci…ed by the probability density f (y).Let

L(u) = the ARL of a EWMA scheme with EWMA0 = u :

When one begins a EWMA sequence at u, there are 2 possibilities of wherehe/she will be after a single observation, Q1. If Q1 is extreme (¸Q1 +(1¡¸)u >UCLEWMA or ¸Q1 + (1¡ ¸)u < LCLEWMA) then there will be an immediatesignal and the run length will be 1. If Q1 is moderate (LCLEWMA · ¸Q1 +(1¡ ¸)u · UCLEWMA) one observation will have been “spent” and on averageL(¸Q1+(1¡¸)u) more observations are to be faced in order to produce a signal.Now the event

LCLEWMA · ¸Q1 + (1 ¡ ¸)u · U CLEWMAis the event

LCLEWMA ¡ (1 ¡ ¸)u¸

· Q1 · U CLEWMA ¡ (1 ¡ ¸)u¸

;

so this reasoning produces the equation

L(u) = 1 ¢

µ1 ¡ P [LCLEWMA ¡ (1 ¡ ¸)u

¸ · Q1 · U CLEWMA ¡ (1 ¡ ¸)u

¸ ]

¶+Z UCLEWMA¡(1¡¸)u¸LCLEWMA¡(1¡¸)u

¸

(1 + L(¸y + (1 ¡ ¸)u)) f (y)dy ;

or

L(u) = 1 +

Z UCLEWMA¡(1¡¸)u¸

LCLEWMA¡(1¡¸)u¸

L(¸y + (1 ¡ ¸)u)f (y)dy ;

or …nally

L(u) = 1 + 1

¸

Z UCLEWMALCLEWMA

L(y)f

µy ¡ (1 ¡ ¸)u

¸

¶dy : (2.10)

As in the previous (CUSUM) case, one must usually resort to numerical

methods in order to approximate the solution to equation (2.10). For a partic-ular quadrature rule for integrals on [LCLEWMA; U C LEWMA], for each ai onehas from equation (2.10) the approximation

L(ai) ¼ 1 + 1¸

mXj=1

wjL(aj)f

µaj ¡ (1 ¡ ¸)ai

¸

¶ : (2.11)

8/21/2019 531notesv2 (SPC)

38/133


Now expression (2.11) is standing for a set of m equations in the m unknownsL(a1); : : : ; L(am) that (as in the CUSUM case) can be thought of in terms of

the matrix expression (2.3) if one takes

L =

0B@ L(a1)...L(am)

1CA and Rm£m

=

0@wjf ³aj¡(1¡¸)ai

¸

´¸

1A : (2.12)Solution of the system represented by equation (2.11) or the matrix expression(2.3) with de…nitions (2.12) produces approximate values for L(a1); : : : ; L(am)and therefore an approximation for the function L(u) as

L(u) ¼ 1 + 1¸

mXj=1

wjL(aj)f

µaj ¡ (1 ¡ ¸)u

¸

¶ :

Again as in the CUSUM case, it is worth noting the similarity between theset of equations used to …nd “MC” ARL approximations and the set of equa-tions used to …nd “integral equation” ARL approximations. With the quadra-ture rule (2.7) and an odd integer m, using the notation ¢ = (U CLEWMA ¡LCLEWMA)=m employed in §2.2 in the EWMA example, note that a genericentry of R de…ned in (2.12) is

rij =wjf

³aj¡(1¡¸)ai

¸

´¸

=¢f

³ai +

(j¡i)¢¸

´¸

¼Z ai+ (j¡i)¢¸ + ¢2¸ai+

(j¡i)¢¸ ¡ ¢2¸

f (y)dy = q ij ;

(in terms of the notation (2.6) from the EWMA example of §2.2). That is,as in the CUSUM case, the sets of equations used in the “MC” and “integralequation” approximations for the “EWMA

0 = a

i ARLs” of the scheme are very

similar.As a …nal example of the use of integral equations in the analysis of process

monitoring schemes, consider the X=MR schemes of §4.4 of V&J. Suppose thatobservations x1; x2; : : : are iid with continuous marginal distribution speci…edby the probability density f (y). De…ne the function

L(y) = “the mean number of additional observations to alarm, given that

there has been no alarm to date and the current observation is y.”

Then note that as one begins X=MR monitoring, there are two possibilities of where he/she will be after observing the …rst individual, x1. If x1 is extreme(x1 < LCLx or x1 > UCLx) there will be an immediate signal and the runlength will be 1. If x1is not extreme (LCLx

· x1

· UCLx) one observation

will have been spent and on average another L(x1) observations will be requiredin order to produce a signal. So it is reasonable that the ARL for the X=MRscheme is

ARL = 1 ¢ (1 ¡ P [LCLx · x1 · UCLx]) +Z UCLxLCLx

(1 + L(y))f (y)dy ;

8/21/2019 531notesv2 (SPC)

39/133


that is

ARL = 1 + Z UCLx

LCLx L(y)f (y)dy ; (2.13)

where it remains to …nd a way of computing the function L(y) in order to feedit into expression (2.13).

In order to derive an integral equation for L(y) consider the situation if therehas been no alarm and the current individual observation is y. There are twopossibilities of where one will be after observing one more individual, x. If xis extreme or too far from y (x < LCLx or x > UCLx or jx ¡ yj > UCLR)only one additional observation is required to produce a signal. On the otherhand, if x is not extreme and not too far from y (LCLx · x · U CLx andjx ¡ yj · UCLR) one more observation will have been spent and on averageanother L(x) will be required to produce a signal. That is,

L(y) = 1¢(P [x < LCL

x or x > UCL

x or

jx

¡yj

> UCLR])

+

Z min(UCLx;y+UCLR)max(LCLx;y¡UCLR)

(1 + L(x))f (x)dx ;

that is,

L(y) = 1 +

Z min(UCLx;y+UCLR)max(LCLx;y¡UCLR)

L(x)f (x)dx

= 1 +

Z UCLxLCLx

I [jx ¡ yj · UCLR]L(x)f (x)dx : (2.14)

(The notation I [A] is “indicator function” notation, meaning that when A holdsI [A] = 1; and otherwise I [A] = 0.) As in the earlier CUSUM and EWMA ex-

amples, once one speci…es a quadrature rule for de…nite integrals on the interval[LCLx; U C U x], this expression (2.14) provides a set of m linear equations forapproximate values of L(ai)’s. When this system is solved, the resulting valuescan be fed into a discretized version of equation (2.13) and an approximate ARLproduced. It is worth noting that the potential discontinuities of the integrandin equation (2.14) (produced by the indicator function) have the e¤ect of mak-ing numerical solutions of this equation much less well-behaved than those forthe other integral equations developed in this section.

The examples of this section have dealt only with ARLs for schemes basedon (continuous) iid observations. It therefore should be said that:

1. The iid assumption can in some cases be relaxed to give tractable integralequations for situations where correlated sequences Q1; Q2; : : : are involved

(see for example Problem 2.27),

2. Other descriptors of the run length distribution (beyond the ARL) canoften be shown to solve simple integral equations (see for example theintegral equations for CUSUM run length second moment and run lengthprobability function in Problem 2.31), and

8/21/2019 531notesv2 (SPC)

40/133


3. In some cases, with discrete variables Q there are di¤erence equation ana-logues of the integral equations presented here (that ultimately correspond

to the kind of MC calculations illustrated in the previous section).

8/21/2019 531notesv2 (SPC)

41/133

Chapter 3

An Introduction to DiscreteStochastic Control

Theory/Minimum VarianceControl

Section 3.6 of V&J provides an elementary introduction to the topic of Engi-neering Control and contrasts this adjustment methodology with (the processmonitoring methodology of) control charting. The last item under the En-gineering Control heading of Table 3.10 of V&J makes reference to “optimalstochastic control” theory. The object of this theory is to model system behav-

ior using probability tools and let the consequences of the model assumptionshelp guide one in the choice of e¤ective control/adjustment algorithms. Thischapter provides a very brief introduction to this theory.

3.1 General Exposition

Letf: : : ; Z (¡1); Z (0); Z (1); Z (2); : : :g

stand for observations on a process assuming that no control actions are taken .One …rst needs a stochastic/probabilistic model for the sequence fZ (t)g, andwe will let

F stand for such a model. F is a joint distribution for the Z ’s and might, forexample, be:

1. a simple random walk model speci…ed by the equation Z (t) = Z (t ¡ 1) +²(t), where the ²’s are iid normal (0; ¾2) random variables,

37

8/21/2019 531notesv2 (SPC)

42/133

38CHAPTER 3. AN INTRODUCTION TO DISCRETE STOCHASTIC CONTROL THEORY

2. a random walk model with drift speci…ed by the equation Z (t) = Z (t ¡1)+d+²(t), where d is a constant and the ²’s are iid normal (0; ¾2) random

variables, or

3. some Box-Jenkins ARIMA model for the fZ (t)g sequence.

Then leta(t)

stand for a control action taken at time t, after observing the process. Oneneeds notation for the current impact of control actions taken in past periods,so we will further let

A(a; s)

stand for the current impact on the process of a control action a taken s periodsago. In many systems, the control actions, a, are numerical, and A(a; s) = ah(s)

where h(s) is the so-called “impulse response function” giving the impact of aunit control action taken s periods previous. A(a; s) might, for example, be:

1. given by A(a; s) = a for s ¸ 1 in a machine tool control problem where “a”means “move the cutting tool out a units” (and the controlled variable isa measured dimension of a work piece),

2. given by A(a; s) = 0 for s · u and by A(a; s) = a for s > u in a machinetool control problem where “a” means “move the cutting tool out a units”and there are u periods of dead time, or

3. given by A(a; s) =¡1 ¡ exp ¡¡sh¿ ¢¢ a for s ¸ 1 in a chemical process

control problem with time constant ¿ and control period h seconds.

We will then assume that what one actually observes for (controlled) processbehavior at time t ¸ 1 is

Y (t) = Z (t) +t¡1Xs=0

A(a(s); t ¡ s) ;

which is the sum of what would have been observed with no control and all of the current e¤ects of previous control actions. For t ¸ 0, a(t) will be chosenbased on

f: : : ; Z (¡1); Z (0); Y (1); Y (2); : : : ; Y (t)g :A common objective in this context is to choose the actions so as to minimize

E F (Y (t) ¡ T (t))2

ortX

s=1

E F (Y (s) ¡ T (s))2

8/21/2019 531notesv2 (SPC)

43/133

3.1. GENERAL EXPOSITION 39

for some (possibly time-dependent) target value T (s). The problem of choosingof control actions to accomplish this goal is called the “minimum variance“

(MV) control problem, and it has a solution that can be described in fairly(deceptively, perhaps) simple terms.

Note …rst that given f: : : ; Z (¡1); Z (0); Y (1); Y (2); : : : ; Y (t)g one can recoverf: : : ; Z (¡1); Z (0); Z (1); Z (2); : : : ; Z (t)g. This is because

Z (s) = Y (s) ¡s¡1Xr=0

A(a(r); s ¡ r)

i.e., to get Z (s), one simply subtracts the (known) e¤ects of previous controlactions from Y (s).

Then the model F (at least in theory) provides one a conditional distributionfor Z (t + 1); Z (t + 2); Z (t + 3); : : : given the observed Z ’s through time t. The

conditional distribution for Z (t + 1); Z (t + 2); Z (t + 3) : : : given what one canobserve through time t, namely f: : : ; Z (¡1); Z (0); Y (1); Y (2); : : : ; Y (t)g, is thenthe conditional distribution one gets for Z (t +1); Z (t +2); Z (t +3); : : : from themodel F after recovering Z (1); Z (2); : : : ; Z (t) from the corresponding Y ’s. Thenfor s ¸ t + 1, let

E F [Z (s)j : : : ; Z (¡1); Z (0); Z (1); Z (2); : : : ; Z (t)] or just E F [Z (s)jZ t]

stand for the mean of this conditional distribution of Z (s) available at time t.Suppose that there are u ¸ 0 periods of dead time (u could be 0). Then

the earliest Y that one can hope to in‡uence by choice of a(t) is Y (t + u + 1).Notice then that if one takes action a(t) at time t, one’s most natural projectionof Y (t + u + 1) at time t is

bY (t + u +1jt) := E F [Z (t +u +1)jZ t] + t¡1Xs=0

A(a(s); t + u +1¡ s)+ A(a(t); u +1)

It is then natural (and in fact turns out to give the MV control strategy) to tryto choose a(t) so that

bY (t + u + 1jt) = T (t + u + 1) :That is, the MV strategy is to try to choose a(t) so that

A(a(t); u+1) = T (t+u+1)

¡(E F [Z (t + u + 1)jZ t] +

t¡1

Xs=0 A(a(s); t + u + 1 ¡ s)) :A caveat here is that in practice MV control tends to be “ragged.” That

is, in order to exactly optimize the mean squared error, constant tweaking (andoften fairly large adjustments are required). By changing one’s control objectivesomewhat it is possible to produce “smoother” optimal control policies that are

8/21/2019 531notesv2 (SPC)

44/133

40CHAPTER 3. AN INTRODUCTION TO DISCRETE STOCHASTIC CONTROL THEORY

nearly as e¤ective as MV algorithms in terms of keeping a process on target.That is, instead of trying to optimize

E F tX

s=1

(Y (s) ¡ T (s))2 ;

in a situation where the a’s are numerical (a = 0 indicating “no adjustment”and the “size” of adjustments increasing with jaj) one might for a constant ¸ > 0set out to minimize the alternative criterion

E F

Ã tXs=1

(Y (s) ¡ T (s))2 + ¸t¡1Xs=0

(a(s))2

! :

Doing so will “smooth” the MV algorithm.

3.2 An Example

To illustrate the meaning of the preceding formalism, consider the model (F )speci…ed by

Z (t) = W (t) + ²(t) for t ¸ 0and W (t) = W (t ¡ 1) + d + º (t) for t ¸ 1

¾ (3.1)

for d a (known) constant, the ²’s normal (0; ¾2² ), the º ’s normal (0; ¾2º ) and

all the ²’s and º ’s independent. (Z (t) is a random walk with drift observedwith error.) Under this model and an appropriate 0 mean normal initializingdistribution for W (0), it is the case that each

bZ (t + 1j t) := E F [Z (t + 1)jZ (0); : : : ; Z (t)]may be computed recursively as

bZ (t + 1jt) = ®Z (t) + (1 ¡ ®) bZ (tjt ¡ 1) + dfor some constant ® (that depends upon the known variances ¾2² and ¾

2º ).

We will …nd MV control policies under model (3.1) with two di¤erent func-tions A(a; s). Consider …rst the possibility

A(a; s) = a 8s ¸ 1 ; (3:2:2) (3.2)

(an adjustment “a” at a given time period takes its full and permanent e¤ectat the next time period).

Consider the situation at time t = 0. Available are Z (0) and bZ (0j¡1) (theprior mean of W (0)) and from these one may compute the prediction

bZ (1j0) := ®Z (0) + (1 ¡ ®) bZ (0j¡1) + d :

8/21/2019 531notesv2 (SPC)

45/133

3.2. AN EXAMPLE 41

That means that taking control action a(0), one should predict a value of

bY (1j0) := bZ (1j0) + a(0)for the controlled process at time t = 1, and upon setting this equal to thetarget T (1) and solving for a(0) one should thus choose

a(0) = T (1) ¡ bZ (1j0) :At time t = 1

531notesv2 (spc)

Documents