chapter 5: intervention analysis and randomized...

56
5-1 CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED EXPERIMENTATION 1. Introduction In Chapter 4 we introduced regression analysis and saw a number of applications in which regression helped us to understand better what was going on in the data. In particular, we learned how to check for the presence of trend effects, periodic effects, and intervention effects. If, as a result of this check, we decided that any of these effects were really present, we learned how to build a regression model so that we could estimate their magnitude. Understanding of such effects is essential both to quality improvement efforts and to holding the gains from past improvements. In the application to mortality in intensive care, we saw that there was a trend towards improvement. Since we also had information on the severity of illness of each group of patients, we were able to make a judgment as to whether or not the improvement trend was only a reflection of changes through time in severity of illness of patients coming to the intensive care unit. We decided that it was not: after introducing the variable severity into the regression model for time alone, we saw that the downtrend in mortality was still significant. This application was our first opportunity to use multiple linear regression, i.e., regression in which there is more than a single independent variable that might contribute to our understanding of the data. In this chapter, we shall consider a quality improvement application -- "Medical Audit" -- in which we have information on several independent variables, including an intervention variable representing an effort to improve a process. We want to learn which of these variables appear to affect the process outcome and to estimate how large any such effects are. The underlying strategy will be the same, and the interpretation of results will parallel what we have already done. However, we shall also learn some tactics for selection, from the potential independent variables for which we have information, of those particular ones that contribute to our understanding of the data. After the example of Medical Audit, we shall turn to two other examples in which the interventions were made, not as a sudden switch from one method to another (as in Medical Audit and in the change of putting stance in the putting example), but as a randomized experiment. That is, sometimes we use the new method, sometimes that current method, using some form of randomization to decide at random which it will be. From the first of these examples, we shall learn how simple randomized experimentation can be used as a powerful tool for enhancing understanding and improving performance. From the second example, we shall learn to set up and analyze an extension of the simple randomized experimental design: the randomized pair design.

Upload: others

Post on 19-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-1

CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED EXPERIMENTATION

1. Introduction In Chapter 4 we introduced regression analysis and saw a number of applications in which regression helped us to understand better what was going on in the data. In particular, we learned how to check for the presence of trend effects, periodic effects, and intervention effects. If, as a result of this check, we decided that any of these effects were really present, we learned how to build a regression model so that we could estimate their magnitude. Understanding of such effects is essential both to quality improvement efforts and to holding the gains from past improvements. In the application to mortality in intensive care, we saw that there was a trend towards improvement. Since we also had information on the severity of illness of each group of patients, we were able to make a judgment as to whether or not the improvement trend was only a reflection of changes through time in severity of illness of patients coming to the intensive care unit. We decided that it was not: after introducing the variable severity into the regression model for time alone, we saw that the downtrend in mortality was still significant. This application was our first opportunity to use multiple linear regression, i.e., regression in which there is more than a single independent variable that might contribute to our understanding of the data. In this chapter, we shall consider a quality improvement application -- "Medical Audit" -- in which we have information on several independent variables, including an intervention variable representing an effort to improve a process. We want to learn which of these variables appear to affect the process outcome and to estimate how large any such effects are. The underlying strategy will be the same, and the interpretation of results will parallel what we have already done. However, we shall also learn some tactics for selection, from the potential independent variables for which we have information, of those particular ones that contribute to our understanding of the data. After the example of Medical Audit, we shall turn to two other examples in which the interventions were made, not as a sudden switch from one method to another (as in Medical Audit and in the change of putting stance in the putting example), but as a randomized experiment. That is, sometimes we use the new method, sometimes that current method, using some form of randomization to decide at random which it will be. From the first of these examples, we shall learn how simple randomized experimentation can be used as a powerful tool for enhancing understanding and improving performance. From the second example, we shall learn to set up and analyze an extension of the simple randomized experimental design: the randomized pair design.

Page 2: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-2

2. Stepwise Regression and Variable Selection: Medical Audit Our first application will provide a review of several ideas developed in Chapter 4, including intervention analysis and the fitting of trend and periodic (seasonal) effects. It will also introduce a strategy called stepwise regression for selection of independent variables that may be useful in multiple regression. The application is based on the data in a file called LAPPEN.sav, described below:

Admissions to 'General Hospital' for tonsillectomies by months from January,1969 through April,1974. taken from term paper by Stan Lappen, Graduate School of Business, University of Chicago, 'Medical Audit: A Viable Assurance of Quality Care?’. The final month is possibly affected by a "medical audit", designed to prevent unnecessary surgery, which went into effect in April,1974.

At the time of this study in 1973 and 1974, there was a growing consensus among physicians that a substantial fraction of tonsillectomies being performed at most hospitals were either unnecessary or counterproductive. The Medical Audit was a review designed for a particular hospital. It required that the physicians responsible for tonsillectomies provide a detailed justification for any tonsillectomy that they performed. This audit was therefore an intervention designed to improve a process by trying to assure that the process was applied only when genuinely needed. However, at the time of this study, only the final month, April of 1974, reflected effects of the Medical Audit. Given the large variation from month to month that you can see simply by looking at the data printed out above, you might be inclined to want to wait for several more months of data before attempting any judgment about the effectiveness of the audit. It will indeed be important to follow up on data on tonsillectomies subsequent to April, but we will nonetheless ask the question, "What can we learn about the effectiveness of Medical Audit as of the end of April, 1974?" The possibility that more data will become available later is no excuse for not trying to see what the data then available could tell us! This is an example of a general principle of data analysis: whether the data available are scanty or ample, we can and should ask, "What can be learned from the data we now have?" There is no magical minimum amount of data, no minimum size of sample, required before statistical tools can be brought to bear. The variable contained in LAPPEN.sav is named “tonsils”. We start with familiar steps of analysis:

Page 3: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-3

We are dealing with counting data, but it is apparent that the Poisson distribution, which we met in Chapter 4 in connection with traffic counts, is not immediately applicable: the square of the standard deviation 7.3 is 53.3, much greater than the mean of 16.7. We shall proceed in the hopes that the normal distribution will provide a satisfactory model for the behavior of the residuals from the regression that we shall fit.

Your visual examination of the control chart should immediately detect a downtrend: the number of tonsillectomies had been tending downward for at least five years before April of 1974, presumably reflecting a growing awareness among physicians of the problem of unnecessary tonsillectomies. In spite of the visually-obvious trend, the runs check does not by itself arouse substantial suspicion about a possible departure from statistical control:

Page 4: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-4

What we can conclude is this: although the runs check, taken by itself, is consistent with statistical control, the visual check (to be confirmed in a moment by fitting a trend by regression) leads us to conclude that the process is not in control. (However, had the runs check strongly contradicted the assumption of statistical control, the runs check would have been decisive even in the absence of visual evidence.) The control chart also shows something else that is very serious. You can see that the vertical dispersion of the points, i.e., the variability of the time series process, is decreasing as we move from left to right: the variability is higher early in the period when the trend is relatively high, and it is lower later on when the trend is lower. In other words, the deviations from trend do not meet the constant variance condition for a process that is in control. In Chapter 3, Section 3, when we studied the exponential distribution, we encountered an analogous problem. The data on time intervals did not follow an approximately normal distribution but rather a highly-skewed distribution -- the exponential distribution -- with a long tail pointing to the right. A simple data transformation -- the cube root transformation -- enabled us to cope with that problem. The transformed data -- cube roots of the time intervals -- were nearly symmetrical and much closer to normality. So we worked with the transformed data. This procedure did not distort or cook the data. It just reexpressed the data in a simpler form in which our simple analysis could be applied, and it did so without sacrifice of any of our objectives, such as checking whether the process (with data on the transformed scale) was in control, or constructing a control chart if the transformed data appeared to be in control. Even predictions could have been expressed on the transformed, cube-root scale, and, if necessary, transformed back from cube-roots to the original time units. Now the problem is not normality of distribution but constancy of variance, but the idea of transformation can be applied. A simpler picture of the data can be obtained by using the square root transformation of tonsils. In the Transform/Compute…dialog boxes we type in

sqrttons = SQRT(tonsils)

and click on OK. You may wonder why we used the square root transformation this time, but used the cube root transformation back in Chapter 3. The ultimate answer is to be seen in the results: in each instance the transformation leads to an analysis for which the diagnostic checks are satisfactory. Beyond that, there is experience to suggest that the square root transformation often works well for counting data while the cube root transformation often works well for time-interval data. To be more precise, if the original data are actually distributed as Poisson variables with changing standard deviation, it can be proved according to statistical theory that the square root transformation will result in a constant standard deviation that is approximately equal to 0.5.

The display below uses the Graphs/Sequence… plot procedure to compare the two plots for tonsils and sqrttons. We see that the square root transformation gives somewhat more constant dispersion throughout.

Page 5: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-5

tonsils sqrttons

Notice that the standard deviation of sqrttons, 0.88735, does not quite meet the 0.5 criterion for the transformation to be a perfect success, but we will continue using the square root data nevertheless. Keep in mind that since the positive square root is in a one-to-one relation to the original data, whatever we predict about sqrttons is equivalent to a prediction about the value of tonsils. At the next step we follow the now-familiar procedure of creating time = $CASENUM. Then we regress sqrttons on time, remembering to save the predicted values and the residuals1:

We see that the fitted trend line turns out to be negative and clearly statistically significant, as we would expect from the appearance of the data plots.

1 If we had run the regression with tonsils, in this case the practical conclusions would have been much the same.

Page 6: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-6

The fitted regression line is

predicted sqrttons = 4.471 – 0.015 time .

Here is the control chart for the standardized residuals:

Page 7: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-7

As we see in the display to the left, the residuals fail the runs test. There are too few runs for the data to be consistent with a random time series. Also, note in the plot above that the residual for month 64, April of 1974, is slightly negative, consistent with the desired effect of Medical Audit, but not significant. (Only two of 64 standardized residuals had absolute values greater than 2.00, and the residual for April, 1974, is less than one standard deviation below zero, as you can see from careful examination of the control chart.)

Recall next that we are dealing with monthly data. Perhaps there is periodicity -- seasonality, in this application -- in the number of tonsillectomies. Seasonality, if present, would not be obvious from the control chart, but we can number the points in an interactive scatter plot from 1 to 12 (12 months in a year) to investigate seasonality. To achieve this, however, we first need to create a variable that indicates the month.

Page 8: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-8

Our Data Editor contains the variables tonsils, sqrttons, and time. Go to the dialog window for Transform/Compute… and type in

month = MOD(time, 12) .

It appears that this has achieved what we wanted except for one problem:

You can see in the Data Editor that a new variable called month has been created that assigns the numbers 1,2,…,11 for January through November, but every time that it comes to December it has the value zero. To explain this, we must explain the MOD function. The function has two arguments, a numerical expression—in this case time, and a modulus—in this case set equal to 12. The function works as follows: Each successive value of time is divided by 12, and month is given a value equal to the remainder from the process of division. Thus, for example, the first month (the series starts in January) receives the value 1 for month, as do the 13th, the 25th, the 37th, etc. Similarly all the February months get the value 2, and so on (Get the idea?). Unfortunately, however, whenever December occurs the remainder from dividing time by 12 is zero, which explains the anomaly. The problem is very easily corrected. See if you can figure out how to change the zeros to the value 12 by using Transform/Compute… and the If feature. (Of course you can always change them by hand, but imagine doing this for 1500 cases and you will see the advantage of using Transform.) Anyway, somehow the Data

Editor now looks like the example on the left below:

In the next few pages we are going to illustrate some SPSS procedures that might seem to be rather complex and “tricky”. It is very important that you understand the purpose of these steps and their final results. If, however, you have great difficulty in duplicating them on your own, do not spend too much time trying at this stage. At the end of our demonstration we will provide you with another file that will contain all of the necessary new data.

Page 9: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-9

Through some fancy adjustments to the interactive scatter plot it was possible to label the individual points with the number corresponding to their month of the year. It is immediately obvious that there are strong seasonal effects. For example, month 4 -- April -- tends to be consistently high, while month 9 -- September -- tends to be consistently low. To capture this periodic effect in our regression model, we use the indicator variable device -- a variable that takes the value 1 when a certain condition holds and the value 0 when the condition does not hold. Indicator variables were first introduced in Chapter 4 for modeling of the startup effect (period 1 of the day) in the Ishikawa data set, and the intervention effect (change of putting stance) in the putting application. In the present example, however, there may be several months that are systematically high or low, so we will need several indicator variables. We will now show how to create an indicator variable for each of the twelve months, using the sequence Transform/Recode/Into Different Variables… Here is the dialog window:

We have dragged the variable month into the first box, Numerical Variable Output Variable, and we have also assigned the name jan (for January) to the Output Variable. Then when we click on the Change button,, the display changes to Next, we click on the button labeled Old and New Values… and we get this new window: Observe that we have typed in 1 in the Old Value box, corresponding to month = 1. In the box for New Value we also have typed 1, the value that we want jan to have whenever month = 1. Then we must click on the Add button. This action causes the Old New box to look like this:

Page 10: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-10

Next, we must mark the button at the bottom for All Other Values and set New Value to 0:

After we click again on Add, the Old New box changes to:

Summarizing, through the sequence described above we have told SPSS that we want to create a new variable, jan, that has the value 1 whenever month = 1 and zero everywhere else. After we click on

Continue in the window above we are returned to the original Recode window where we must click on OK. Finally, the Data Editor has a new indicator variable as shown here:

Unfortunately, we must repeat this rather tedious procedure until we have created an indicator variable for each of the months. For example, the next new variable is called feb. After naming it in the Recode into Different Variables window and clicking on Change followed by the Old and New Variables… button, we specify the recode as follows

and after two more mouse clicks the Data Editor will contain a new indicator variable named feb. We admit that this procedure is repetitious and very tedious for the user of SPSS. It is, perhaps, one of the most inefficient aspects of the Student Version of the software. (In the full-blown version it would be possible, perhaps, to use the syntax feature to create twelve indicator variables with less effort.) We therefore will spare you, the reader, any further explication, and we invite you to open a file called LAPPENa.sav where we have already done the rest of the work:

Page 11: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-11

Note that in LAPPENa.sav, shown above, we have narrowed the column widths and removed unnecessary decimal places by making the required changes in the Variable View page. If you have any remaining uncertainty about your complete understanding of the meaning of indicator variables, study the contents of the data file carefully and ask questions in class until you are absolutely sure that you have the concept down pat.

Systematic Selection of Independent Variables Using STEPWISE REGRESSION

With LAPPENa.sav in the Data Editor, we select Analyze/Regression/ Linear… again, but this time the setup is different:

We have designated sqrttons as the dependent variable, and in the box for Method: we have now selected Stepwise. Note also that in the list of available variables in the left-hand box we have highlighted every variable that might appear on the right-hand side of the predictive equation except month. The variable, month, is only a vehicle for constructing the separate indicator variables for each month of the year and should not be used in the regression model. When we click the arrow button, all of the

Page 12: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-12

month indicators plus time will be moved into the box for Independent(s):. Later on we will take advantage of some of the other choices for Method:, but this is enough to start. After making the usual choices for Statistics…, Plots…, and Save…, check the Options… window. It should look like this:

We shall discuss the contents momentarily in our explanation of how the stepwise procedure works. Returning to the main Linear Regression window, we click OK. Here is the first table that appears:

When you perform the regression on your own computer, the table above may look much longer and narrower. We have made it wider by double-clicking with the mouse pointer (see arrow above) on the right border and stretching it to the right, thus making the details in the Method column easier to read. The table displays the sequence of steps (four in this case) that are followed to arrive at a regression model that is a “best fit”. During STEP 1, for example, the routine asks (figuratively), “Of all the candidate variables, time and the twelve month indicators, which one, if selected to be the independent variable, would have the highest t-value with a significance level less than or equal to 0.05?” It finds that apr, the indicator for the month of April, meets that criterion, and it is entered into the equation. If you skip down the computer output past the ANOVA table, you see the following table:

Page 13: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-13

From the row for Model 1 above we read that the first regression equation is

predicted sqrttons = 3.873 + 1.197 apr . The table immediately above contains all of the information of the first table and more. Thus we shall focus on it in the future. We must also consider, however, the second table that appears in the regression output. We display it here:

This table shows us that the correlation, R, between sqrttons and apr is 0.396 (its square is 0.157), and that the Std. Error of the Estimate is 0.82121. (Recall that we have referred earlier to this quantity as the residual standard error.) A few pages above, when we first defined sqrttons, we saw that its standard deviation was 0.88735, so regression modeling has already reduced our uncertainty somewhat. Back to the table of Variables Entered/Removed or, better yet, the table of Coefficients, we see

that at STEP 2 the next variable chosen is time. The new regression model is

predicted sqrttons = 4.372 + 1.223 apr - .015 time .

Page 14: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-14

The Model Summary table shows that R and R Square are increased to 0.512 and 0.262, respectively, and the residual standard error is decreased to 0.77467. The process of stepwise regression continues through STEP 3 by choosing sep:

predicted sqrttons = 4.456 + 1.134 apr - .015 time – 1.024 sep , and it stops at STEP 4 after selecting oct:

predicted sqrttons = 4.528 + 1.051 apr - .015 time – 1.108 sep - .880 oct .

Why did it suddenly stop? The answer is given by the Options specification that we showed above and repeat here: The probability of F on entry has been set at 0.05. In this context the F statistic is the same as the t-ratio. The specification says that for a variable to be selected the significance level for its corresponding t must be less than or equal to 0.05. Furthermore, if any variable already in the model has its significance change to a value at or above 0.10, it must be removed from the model. We have stopped the selection process because there are no more candidate variables whose t-ratios would be large enough, nor are there any variables among

those already chosen that should be eliminated. As shown in the Model Summary table, at each stage in the stepwise process R and R Square are increased and the residual standard error is decreased until we cannot achieve any greater improvement in the fit without introducing new variables for which the coefficients are “statistically insignificant”. Consider the next-to-the-last table that was produced by the stepwise regression command. It is entitled Excluded Variables:

Page 15: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-15

We have copied here just the column headings and the section of the table that pertains to the last stage, Model 4. The table lists the variables that were not selected at each stage. Note that each variable has a significance value for t that is greater than 0.05. The only two variables that are not already in the model that you might consider if your entry value for Sig. were relaxed are may and/or jun, with may the prime choice because its p-value is the lowest. Recall that in Chapter 4 we defined R, the correlation coefficient, as a measure of association between two variables—hence the adjective “bivariate”. Here, however, we are discussing a model involving a dependent variable (in this case sqrttons) with more than one variable—four variables in our final model. What, then, is R measuring? The answer is that it is the correlation between sqrttons and the predictive values, i.e., the values of the predictive linear equation for each of the cases in the data set. In the original setup we asked for the routine to save the unstandardized predictive values, PRE_1. If we apply the SPSS sequence Analyze/Correlate/Bivariate… to the pair sqrttons and PRE_1 we get this display:

Observe that the Pearson correlation is 0.655, exactly the same as the value of R shown in the Model Summary table, above, for Model 4.2 To further illustrate the process of stepwise regression we will now modify the Options setup:

Observe in the window at the left that we have changed the entry level of F to 0.98 and the removal setting to 0.99. (For technical reasons the latter value must always be greater than the former.) These changes will have the effect of driving the search forward one step at a time, even if variables are included when their absolute t-ratios are less than 2, until virtually all of the candidate variables are included in the model. In order to fit it on these pages, we must break up the Coefficients table into three pieces:

2 A feature of the method of least squares is that, given a dependent variable and a set of independent variables, it produces a linear combination of the independent variables (i.e., a predictive equation) that has maximum correlation with the dependent variable. That amounts to the same thing as minimizing the sum of squared residuals.

Page 16: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-16

Page 17: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-17

Page 18: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-18

We also display below an image of the Model Summary table to help us explain what is happening here:

Page 19: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-19

The results of the first four steps are the same as before. At STEP 5, may enters the model but with a t-ratio of 1.857, less than 2. (Note that this is the value of the t-ratio that was given above in the list of variables that had not entered.) However, at STEP 6, jun enters the model with a t-ratio of 2.138, and may's t-ratio goes up to 2.152. At this stage all six independent variables have absolute t-ratios of at least 2 and p-values less than 0.05. The residual standard error is down to 0.65876 and R Square = 0.501. This shows that the hierarchical stepwise approach, as we first applied it with F to enter equal to 0.05 and F to remove set at 0.10, is not infallible. It is possible to miss a better model. Continuing, however, at STEP 7, jul enters the model but with a t-ratio of -1.318. Since -1.318 is less than 2 in absolute value, it is probably not a good idea to use the model past STEP 6. Since, as we shall see later, there is always a potential danger in "overfitting" the data by use of stepwise regression, we could reasonably even go back as far as STEP 4, our original stopping point, and not lose much in the goodness of fit and our ability to predict new observations.3 3We shall later talk about "parsimony", that is, in case of doubt in choosing among similar models, lean toward models with fewer independent variables.

Page 20: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-20

There is no ideal criterion for variable selection, no unambiguous way to determine a "best regression". Fortunately the practical conclusions are unlikely to depend much on the precise choice. The main thing is to be "in the right ballpark". Observe that as SPSS goes from STEP 1 to the final STEP 11 in the stepwise output above R Square increases steadily. The same is not true, however, for the decrease in the residual standard error. The latter continues to fall until STEP 8, where it begins to increase. This is the step where the t-ratio of the entering variable is the first to be less than 1 in absolute value. (The t-ratio for mar is -0.282.) As the remaining entering variables all have absolute t-ratios less than 1, the residual standard error continues to increase until the end. R Square must increase (at least not decrease) as additional variables are added to the regression, but residual standard error ( Std. Error of the Estimate) does not necessarily always decrease. Its behavior depends on a balance between the closeness of fit of the regression surface, the number of independent variables, and the total sample size. In the next chapter we will explain Adjusted R Square, a measure of goodness of fit that we will ultimately prefer over R Square.

Here is the last panel in the table of Excluded Variables:

We see that jan and feb would make such a trivial contribution to the fit of the model that they are left out entirely. When we first set up the regression with F to enter at 0.98 we said that “virtually” all of the possible independent variables would be included. For technical mathematical reasons it is not possible to have a least squares model with all of the months included. It is feasible to include either jan or feb, but not both. Forcing in one of them, however, is not worth the effort.

Now we shall take the STEP 6 result and examine this model more closely by performing the diagnostic checking of the regression assumptions. (Choice of STEP 5 would also have been reasonable and would have given a simpler -- more "parsimonious" -- model.) The STEP 6 model is

predicted sqrttons = 4.362 + 1.189apr - .014time - .968sep - .742oct + .675may + .671jun .

Starting with LAPPENa.sav in your Data Editor, rerun the linear regression with Method: set at Enter, and only the variables in the above equation as independent variables. Be sure to save the predicted values and residuals. We will not show the tables again since the necessary results have been displayed in the discussion of stepwise procedures above. Here, however, is the control chart for standardized residuals:

Page 21: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-21

Our ITT alarms are immediately actuated by the plot above. (If you have forgotten what “ITT” means, look back at the early part of Chapter 2.) Observation 64 -- April, 1974 -- is singled out as having by far the most extreme residual, nearly on the lower control limit below the value fitted by the regression. This was of course the month of Medical Audit! Our analysis has led to a strong suggestion that an improvement in cutting down on tonsillectomies may have occurred in the very first month. With the original data and the analysis based on trend alone, there had been no hint of this important finding. Any effect of the Medical Audit was obscured by the large month-to-month variations in the number of tonsillectomies.

An Alternative Approach to Modeling

In checking for possible effects of the Medical Audit thus far we have used all of the observed data in building a multiple regression model, including the observation for the month during which the audit was applied, April 1974. Before continuing, we shall now discuss an alternative approach that might have been used—one that is arguably equally effective in helping us to decide whether or not the audit had an impact. Starting with the Data Editor in the form shown on page 5-10, we use the technique described in the appendix of Chapter 4 and we create a new variable called select, setting it equal to time. We then run the stepwise regression again, but selecting only the first 63 cases to build the model, i.e., not allowing April 1974 to have any influence in determining the regression fit. We do not show the results here, but a stepwise regression shows that the indicator variables, apr, may, jun, sep, and oct, plus time

Page 22: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-22

are still very good explainers of the variance of sqrttons. In fact, the model fit as shown by a low residual standard error is even better than before because we have not used the outlying data for April 1974. Thus, using the same independent variables that we chose with the full data set, we run the regression again, but this time without the final month. We do, however, request the predicted value for April 1974 as well as the 99% confidence limits for that forecast. (That was the whole idea behind using the new variable select in our regression analysis setup.) If you follow this procedure on your own PC you will see that for April 1974 (Case 64) the predicted value of sqrttons is 5.08. Furthermore, the lower 99% confidence limit for that prediction is 3.23. We then reason as follows: Using only the data before the Medical Audit went into effect we have obtained a regression model that predicts with 99% confidence that the actual value for sqrttons in April 1974 should be in an interval greater than 3.23. We observe, however, that the actual number of tonsillectomies was 9 (i.e., sqrttons = 3, below the lower limit!). Thus the actual data contradict our prediction. We can therefore take this as an indication that the process has changed—that is, the audit is starting to have its desired effect in lowering the number of tonsillectomies performed.

Introducing the Medical Audit as an Explicit Indicator Variable We follow up the alternative approaches discussed above by including an explicit indicator variable for the Medical Audit intervention (using the full data set of 64 cases). In this instance, the intervention affected only one month, so the indicator variable medaudit below takes the value "1" for that April, 1974, and zero for all other months. We can create medaudit as a variable consisting of all zeros via the sequence Transform/Compute…, and then we can make it an indicator variable by typing in the value “1” for Case 64. We then run the linear regression once more, but this time including the new variable medaudit on the right-hand side of the equation. The relevant new output is

Page 23: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-23

The model is now predicted sqrttons = 4.267 + 1.532 apr - .011 time - .969 sep - .745 oct + .686 may + .679 jun - 2.078 medaudit The most interesting part of the output is the following: The new indicator variable medaudit is significant with a t-ratio of -2.998 (note the negative sign, which is the direction sought after) and a p-value of 0.004, suggesting strongly that the Medical Audit was already bringing the desired results in the very first month after its launching. It would still be desirable to follow data from the Medical Audit program in further months to see if the effect continues. But it would have been unnecessary to wait for several months before even looking at the data to see what effect, if any, the intervention is having. Note also that the standard deviation of residuals in now down to 0.61695, not far above the Poisson expectation of 0.50, and that R Square = 0.570, suggesting that we have learned a good deal about factors influencing the frequency of tonsillectomies. Finally, note that the effectiveness of the Medical Audit was not at all apparent without careful analysis. Actually, there were more tonsillectomies in April than in March of 1974, 9 vs. 7. It was the regression analysis that confirmed the presence of seasonality and the fact that April, in particular, was a seasonally high month. The downtrend was so strong as to be obvious, but in the absence of careful data analysis and/or background knowledge, the seasonality might have been missed. April, 1974, was only slightly below

Page 24: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-24

what would have been expected from the downtrend alone. The improvement from Medical Audit in that month could easily have been missed if we had not introduced the monthly indicator variables. Diagnostic checking is of course in order, and you should study through the following output. In general, the conformity to the assumptions of the regression model is excellent.

Diagnostic Checking

Note that the residual for April 1974 is exactly zero. That is because we have created the variable medaudit which is equal to one for that month only and zero otherwise. Thus the fitted regression function passes through the point corresponding to the value of sqrttons for that month.

Page 25: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-25

Page 26: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-26

Visualizing the Fit The remaining plots are not really diagnostic checks of regression assumptions but ways of visualizing what has been learned from the statistical analysis. The first plot, dependent variable versus predicted value, shows in the form of a simple scatter plot how closely the fitted values of the multiple regression relate to the dependent variable. The correlation coefficient for this plot is the multiple correlation coefficient.

Page 27: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-27

The SPSS sequence Graphs/Sequence… enables us to show as time series both the dependent variable and the predicted values on the same chart. It gives a good idea how well the fitted values track the actual values through time. (You should be experienced enough with SPSS by now to figure out how to do it by yourselves.):

Finally, because of the fact that all of the independent variables except time are 0-1 indicator variables it is possible in this special case to visually display the multiple regression model in a meaningful way. Ordinarily, when we are regressing a dependent variable, y, on seven independent variables, the picture of the regression surface (technically called a “hyperplane”) requires an 8-D plot.4 If you think about it, you can see that the regression coefficient for each 0-1 variable is just an increment that is added or subtracted to the constant in the regression model each time that the indicator is equal to one. For convenience we display some of the output from the final regression again:

4It is difficult enough for some students to see what is going on in the 3-D plots that we showed near the end of Chapter 4. Needless to say, we cannot, in general, depict the 8-dimensional situation.

Page 28: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-28

This shows that for any points that are not the months April, May, June, September, or October, the regression model is simply predicted sqrttons = 4.267 - 0.011 time For the set of points, however, that are April months, excluding April 1974, the equation is predicted sqrttons = 4.267 + 1.532 - 0.011 time = 5.799 - 0.011 time And for September months, the model is predicted sqrttons = 3.298 - 0.011 time Do you catch on? The multiple regression model is really nothing more than six different simple regression lines relating sqrttons to time: One for April (excluding 1974), one for May, another for June, one for September, one for October, and one for all the other months combined. Finally for the special month, April 1974, the fitted model is a single point with coordinates (time=64, predicted= 3). The important feature of the model to note is that all of the lines of fitted values are parallel. Only their heights differ, corresponding to the differences in the constants. The various regression lines are shown in the following multiple scatter plot (dressed up a bit via Microsoft Paint):

Page 29: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-29

Page 30: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-30

Systematic Strategies for Regression Analysis With more experience in data analysis, you could have reached the final result more quickly than we did above. Here are some hints for a systematic approach in any application for which you have a dependent variable Y of interest and a series of possible independent variables X1, X2, ... that may help to explain variations in Y. This is not a complete strategy, but it will help you to deal satisfactorily with a wide range of practical applications. This strategy is mainly for reference. At first reading, you should just glance at this strategy to get the general idea. You will really come to understand it by applying it systematically to actual problems in data analysis. Also, we have been applying it, and will continue in all the applications in STM. Examine the statistical behavior of Y itself through time, in

the manner of Chapters 2-4. If there is evidence of nonconstant variance, as in the Medical Audit, look for a simple data transformation, such as square root, for which transformed data will show more nearly constant variance. (Stick with this transformation unless further steps show that it was unwise, in which event you come back to the start to reconsider it.)

Is Y or a simple transformation of Y close to being in statistical

control? Yes No

Does Y appear to have a histogram that would be expected from an underlying normal distribution? Yes No

Is there visual evidence or background knowledge to suggest systematic patterns such as trends or seasonal patterns? Yes No

Use Control Chart and go to

Go to

below.

Does Y appear to follow one of

the other standard distributions such as the Poisson, binomial, or exponential? Yes No

Formulate variables to repre-sent these systematic patterns. Explore these variables and others with stepwise methods. Continue to

A B C D E

Page 31: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-31

We continue the flow diagram by connecting the lines labelled A, B, C, D, E: A B C D E Would some simple

data transform bring Y closer to normal? Yes No

Is there evidence of special causes that should be examined? Yes No

Go back to

check for

special causes.

Keep thinking. Go to

Maybe will improve.

Do transformation and go to

Investigate. If found create indicators and continue to

Are additional variables such as severity in the medical audit example available that may help to explain Y?

Yes No

Add to data set and go to

Look for some and go to

Use stepwise regression to try Y on all potential variables.

Select tentative model . Apply linear regression to obtain more information including diagnostic checks. Does model pass ?

Yes No

Summarize results showing plots of actual vs.

fitted and time series of actual and fitted. Report what you have learned and actions that are indicated.

Do what further exploration you can. Stay tuned for examples and suggestions on how to do this. Check residuals for special causes. If any new ideas, model causes and go back to

Page 32: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-32

3. A Randomized Experiment We have seen two detailed examples of intervention analysis: the study of Medical Audit just concluded and the Putting Study from Chapter 4. In each instance a process change was made, and process performance before and after the change was compared.

• In the Putting Study, the process appeared to be in control at one level before the intervention and at a higher level afterwards, so the regression analysis required only a single independent variable, interven.

• Multiple regression was needed in the Medical Audit to adjust for the effects of trend and

seasonality, which otherwise would have obscured the effects of the intervention. Intervention analysis is a powerful statistical tool for quality improvement. Often it is the only available tool. That is, no matter how convinced we are that a proposed change is an improvement, the only way to test see if the change really works is to try it out -- to intervene in the process of interest. Yet there is always a danger that at the very time we intervene to make the hoped-for improvement, something else has changed. Thus it is conceivable that the golfer may have benefited from a practice effect throughout the whole series of 20 trials, so that the improvement on the last 10 trials over the first 10 trials might have been a reflection of the practice effect. In the putting application, we showed that the sudden-improvement model (change of stance) fitted the data better than the gradual-improvement model (practice effect). Hence by statistical analysis, we were able to reduce ambiguity as to proper imputation of causation. But consider other possibilities. Suppose that at the time when the golfer changed his stance, he had inadvertently made some other change, like modifying his grip, getting more rest, starting to use a special glove for putting, or even obtaining a new prescription for his eye glasses. Then, although the improved performance reflected in the data is still statistically significant, the imputation of causation to the change in stance would be unwarranted. Some or all the credit may belong elsewhere to one or more of the other changes that had been made at the time he altered his stance. When we use intervention analysis to learn about causation, there is always this kind of danger:

At the very time the intervention was made, some other change occurred -- of which we may or may not be aware -- that could influence the performance of the process we are trying to improve.

Page 33: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-33

Often, intervention analysis is the only feasible approach, but in using it we must:

• Avoid changing too many things at once (which is tantamount to tampering)

• Be aware of external changes that could affect the process we are working on and cause results that would be erroneously attributed to the intervention

Sometimes it is practicable to pursue an alternative strategy, called randomized experimentation, that greatly reduces the risk of erroneous attribution of causation and that can bring other benefits besides. We shall explore randomized experimentation in this and subsequent chapters. To introduce the basic ideas, we start with an example contained in the data file YIELDS.sav, displayed below:

Experiment with supposedly improved method for chemical reaction (Method B) versus the standard method (Method A). Col. 1: yield, expressed as a percentage of the original input Col. 2: timeseq: Time order in which the runs were made Col. 3: method: Indicator for treatment, equals 1 for method B, 0 for method A Source: Box, Hunter, and Hunter, Statistics for Experimenters, p. 159, Problem 28 (Wiley, 1978)

The plan was to try out Method A (standard method) on 10 batches and Method B (the modified method) on 10 batches and to compare average process yields. It was hoped that mean yield would be higher for Method B. One way to do the experiment would be by intervention analysis. Since Method A is currently standard, we could measure yields from 10 successive batches, then switch over to Method B for 10 successive batches. The statistical analysis illustrated already would apply: we would study the data to see whether the process appeared to be in statistical control both before the process change and afterwards (when, we hope, the mean level would be higher). An alternative strategy would be sometimes to use Methods A and sometimes to use B, deciding at random which to use on any one trial. This could be done as follows:

• On a deck of 20 cards, write "0" for Method A on the first 10 cards and "1" for Method B on the second 10.

• Shuffle the 20 cards very thoroughly.

• Number the shuffled deck from 1 to 20 to establish the sequence in which A and B will be tried

during the actual experiment. The listing below is in the original order of the deck. The numbers for timeseq designate the sequence in the shuffled deck and hence the order of execution of the 20 trials:

Page 34: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-34

The shuffled order -- represented by timeseq -- establishes the sequence in which the experiment will be run. Thus, the original first card, which was a 0 (standard process), will be run 16th; the card to be run first is a 0 (standard process); the final card in the original deck is a 1 (modified process) and will be run 13th. (Actually, this ordering is more easily carried out by use of random numbers --see the SPSS function RV.UNIFORM(min,max) -- rather than card-shuffling, but the idea is the same.) The next step is to run some descriptive statistics, but this time using the sequence Analyze/Compare Means/Means… The setup should look like this:

We have specified method as an independent variable, which means that the results will be displayed for two groups, Method A and Method B. Before executing we also set up the Options as follows:

Page 35: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-35

Here is the resulting output:

We see immediately that mean yield is higher for Method B ("1"), 71.98 - 52.21, a difference of 19.77 percentage points. Method B looks promising. Before proceeding with analysis, however, we do a bit of housekeeping. We want to have the information on the 20 trials listed in the time sequence in which the trials were actually run. We can do that by the SPSS sequence Data\Sort Cases…. Before sorting, however, in order to make this example easier to understand, we shall copy the variables yield, timeseq, and method into three additional columns in the spreadsheet, giving them modified names. We accomplish this via the transformations y=yield, t=timeseq, and m=method. You can check to see that the interior of the Data Editor now looks like this:

At the next step we want to sort the data using the new variable t as the sort key, but we wish also to exclude the variables, yield, timeseq, and method, from the sort process so that we can look at them in their original order. The easiest way to accomplish this is to place the mouse pointer on the column heading row of the Data Editor and highlight the three whole columns for yield, timeseq, and method:

Then, after clicking the right mouse button, perform a cut. The highlighted columns will no longer appear in Data Editor, but they will be held on the clipboard for later insertion. (Before trying this it is probably a good idea to save the full spreadsheet to your Desktop so that you can recover it if anything goes wrong.)

Page 36: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-36

Now, with the Data Editor containing only y, t, and m, we set up the sort as shown below:

We have selected t as the sort key, which means that when we click on OK t will be sorted in ascending order and the other two variables in each case will be carried along. Then, after the sort is finished, we paste the original variables yield, timeseq, and method back into the spreadsheet. Before you paste be sure to highlight all three of the columns to the right of those that

are already in use. The reconstructed Data Editor now looks like this:

Now we are ready to run the regression of y on m.

Page 37: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-37

The results look promising: The constant is 52.21, which from Descriptive Statistics above is seen to be the mean yield for the 10 batches run with the standard process (m = 0). The regression coefficient of m is 19.77, which we have seen to be the increase in mean yield for Method B over Method A. The standard error of the estimated coefficient m is 3.135, the t-ratio is a whopping 6.306, and the p-value is 0 to three decimal places (thus less that 5 in 10,000). Hence the improvement is clearly statistically significant. All looks well, but let's examine the diagnostic checks, starting with the Control Chart for residuals:

Something seems wrong. Visual analysis clearly indicates an upward trend of the residuals

On the left we show the results of performing the runs test in two ways—first, using the median as the cutoff value, and then using the mean, as we have generally been doing in the past. We see that the hypothesis of randomness is rejected in one test and not rejected in the other. These conflicting results are probably due to the rather small sample size, but since the significance level, 0.114 (using the mean as the cutoff), is not very much greater than 0.05, even if we did not have the upward trending plot, we would be concerned about the data.

The data plot, reinforced somewhat by the runs count, points to the inescapable conclusion that there has been a failure to keep external conditions reasonably constant during the time the 20 batches were being manufactured. The regression has taken into account the effects of the

Page 38: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-38

two methods, and what is left over in the residuals includes a strong upward time trend.5 Does this trend in residuals invalidate the apparent improvement due to Method B? Fortunately we can investigate this question by a multiple regression that includes a second independent variable t along with m for method:

We see first that the trend effect is highly significant; it amounts to about 3/4 point improvement per batch.

But, interestingly, the estimated improvement due to Method B has changed only slightly from the previous regression, 20.08 versus 19.77, and the significance is higher than before; for example, the t-ratio is 8.372. Also, the residual standard error (Std. Error of the Estimate) is down to 5.36 from 7.01. The introduction of the trend variable has produced a substantial improvement in overall fit: note that R Square is now 0.828. This is the underlying reason for the even higher significance of Method B in this analysis than in the first analysis, even though the estimated effect -- about 20 units -- is about the same. The standard error of the regression coefficient for m in the first regression was 3.135; now it is only 2.398. There are two important lessons to be drawn from the example:

• By data analysis, we have learned about a failure to maintain proper statistical control of the experiment. But it was more than a "failure": there was an unnoticed and unexpected

5If the original variable yield had been regressed on method we would have obtained the same output as above. The diagnostics on the time behavior of the residuals, however, would not have been possible because yield and method are listed with the old method first, followed by the new method.

Page 39: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-39

improvement in process yield during the time that the 20 batches were being produced. The trend value was 0.772, and 20 times 0.772 is 15.4.

Thus whatever the inadvertent changes that were made in the process during the conduct of the experiment, they led to nearly as much yield improvement as is attributable to the Method B itself.

It would be important to find out what these changes were so that they can be retained and the gains sustained. (Remember that a trend as such simply tells that something systematic is changing. Direct investigation and further study is needed to try to find out what lies behind the trend.)

• Even though the conduct of the experiment itself was flawed (in a "good" direction), the

assessment of Method B was almost unaffected. The estimate of a 20-unit yield gain was indicated whether or not we did the full analysis that took account of the time trend.

That happened as a consequence of the randomization. We shall explain this in detail after we present the diagnostic checks and summary graphs of fit for the second model.

Page 40: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-40

The Role of Randomization Now we turn to the role of randomization in the experiment. The randomization shuffled up the sequence in which the two methods, A and B, were used, as is seen by the following time sequence of m: 0 0 1 1 0 1 1 1 0 0 0 1 1 1 0 0 1 0 1 0 We see that the 1's and 0's are scattered irregularly from the beginning to the end of the 20-batch experiment. We know that there was a favorable time trend for yield that had nothing to do with the two methods. Suppose that randomization not been employed, and that, say, the 10 Method A's been run first and the 10 Method B's second, as follows: 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 Then Method B's effectiveness would have been exaggerated relative to Method A because B would have received the benefit of the positive time trend. In fact, since A and B were shuffled up by the randomization, each tended to receive about the same help and hindrance from the time trend. One way to show this statistically is to calculate the correlation between method and timeseq which shows very little relationship between the two variables:

Page 41: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-41

If method had been seriously correlated with timeseq, then the effect of the Method B would have been over- or underestimated, depending on the sign of the correlation. We would call timeseq in that case a “confounding variable”. To bring home the importance of randomization in the presence of other confounding effects, we carry out the following simulation exercise with SPSS. We focus in the SPSS spreadsheet on y and t which are in order by time. 1. The first step is to remove the effect of the method, m, from the values of y. (For purposes of this exercise we will assume that we know for sure that the effect of the method is +20.) The removal is done by the SPSS transformation y2 = y - m*20 If you think about it carefully, you see that y2 now consists of its mean, plus the effect of timeseq, plus an error term 2. Next, we “add back” to y2 the effect of the method, but under the assumption that the assignment of the A and B methods was not randomized. Instead, we assign ten treatments with Method A followed in time by ten with Method B: y2 = y2 + method*20 3. Now we are ready to run the linear regression of y2 on method:

Page 42: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-42

The regression equation is predicted y2 = 46.93 + 30.33 method It is clear that the effect of Method B is seriously overestimated-- by about 50%-- because of the confounding with t (time). To be fair, we must admit that there are many situations in which we would like to randomize, but cannot because of various constraints, be they physical, moral, political, ...whatever. Consider, for example a study in which we need to estimate the effect of a new, tough state law against drunken driving. The law is passed and it is in place. We cannot randomly turn it on and off from week to week or month to month. Thus to see the effect on driving fatalities we are stuck with a before and after study in which our results may be confounded by other variables (better driver education, less consumption of alcohol due to reasons other than the new law, etc.) Consider another example-- smoking and health. We cannot, for obvious reasons, randomly assign human beings to one group in which they are taught to smoke and another in which they are forcibly prevented from smoking. These, and similar situations, are called “observational” as opposed to “experimental”, in the sense that we observe the data as they arise naturally and have to make the best of it, being careful not to allow confounding to lead to erroneous conclusions about causal effects. You may think that in the simulation exercise that we just performed, the bias of overestimation can be corrected by including t (time) in the regression model. Let’s see what happens if we do that:

This is what can happen to you if you do not randomize when experimenting with new methods of manufacturing, packaging, pricing, teaching, playing tennis, whatever.

Page 43: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-43

We see that the upward bias in the estimate of the effect of method is still present, and the coefficient of t is not even significantly different from zero. How can this happen to us? We know from the original analysis that the effect of t is causing the problem. The answer lies in the correlation between method and t:

The correlation of 0.8671 shows that there is a strong linear relationship between method and t. You may recall that at the end of Chapter 4 we discussed this situation and warned that it could cause serious problems. We now have a good example of the way in which the high correlation between two independent variables prevents us from getting valid estimates of their separate “true effects” on the dependent variable. (See Appendix 1 of this chapter for a visual depiction of the problem.)

Summary Finally, to complete the picture, suppose there are two independent variables X1 and X2 and a dependent variable Y. Consider three possible regressions, two simple regressions and one multiple regression. Y on X1 Y on X2 Y on X1 and X2

Page 44: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-44

The principle is this: if the two independent variables X1 and X2 are uncorrelated, The regression coefficient of X1 in regressions 1 and 3 will be the same. The regression coefficient of X2 in regressions 2 and 3 will be the same. In the present application, X1 (m) and X2 (t) are nearly uncorrelated, so we get nearly the same estimate of the effect of m in the simple regression as in the multiple regression. The randomization does not guarantee that m will be nearly uncorrelated with t, but the expected correlation between m and t is zero. The actual correlation will depend on sample size, but the larger the sample size, the less the correlation is likely to be. The principle extends in a direct way to more than two independent variables in a regression. For example, if any pair of two independent variables from m, t, and X3, and X4 is uncorrelated, then the simple regression coefficients for m, t, X3, and X4 will be the same as the corresponding coefficients in the simple regressions. Now suppose X3 is some other variable that affects Y but that is not included in the analysis and is totally unsuspected. The randomization with respect to m protects against any distortion due to X3 in the estimate of the effect of m on Y. Randomization is not always practicable. For example, if Methods A and B each require time-consuming or expensive setups, the simple intervention analysis may have to suffice. Then we run 10 A's and 10 B's in that sequence, and try hard to control or measure other factors that can influence Y. But when randomization is practicable, we have a powerful tool for quality improvement. We shall see more examples of the usefulness of randomization, some in more complicated experiments than the simple study of this section. 4. Randomized Pairs In setting up an experiment, it may be possible to combine randomization with an additional technique, blocking, which will be demonstrated in the context of a data file named BLOODPRS.sav. The background and rationale are explained in the note below:

Blood pressure and hyperventilation. Dr. George Sheehan says that deep breathing will lower blood pressure. Experiment to check this out. hypervent = 1 if twelve deep breaths before blood pressure reading; = 0 for normal measurement. diastolic = diastolic reading systolic = systolic reading pulse = pulse rate Randomized pair design. Two readings, one with hypervent = 1 and one with hypervent = 0, taken about 5-6 minutes apart; 10 pairs spaced across the day, starting from 7:59AM and finishing at 8:04 PM. Within each pair, random numbers used to decide whether hypervent is 1,0 or 0,1.

Page 45: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-45

We use the variable names trial, hypervent, systolic, diastolic, and pulse in that order, and open the data file in SPSS. As described above, hypervent is a 0-1 indicator variable showing whether or not deep breaths were taken before the blood pressure reading. Most of you probably know that systolic is the upper number in the reading, and diastolic is the lower. Next, we show how this randomized-pair design differs from the simple randomized design of Section 3. hyper 0 1 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 0 1 In this string arranged in time sequence, there is shuffling of the 0's and 1's, but it is restricted: within each successive two trials, there must be exactly one 0 and one 1 (the order of which is determined randomly), as emphasized below: | 0 1 | 1 0 | 1 0 | 0 1 | 1 0 | 1 0 | 1 0 | 0 1 | 1 0 | 0 1 | Now, by contrast, here is the corresponding printout for the variable method in the chemical yield application of Section 3: | 0 0 | 1 1 | 0 1 | 1 1 | 0 0 | 0 1 | 1 1 | 0 0 | 1 0 | 1 0 | In that application, six of the ten pairs were either 1,1 or 0,0. That was probably OK for the chemical yield study, but consider the following considerations for the blood pressure example. Blood pressure readings are likely to vary substantially over the course of a day. For very short periods within the day, like 5 or 10 minutes, the variation is likely to be less. Hence if we take two observations close together, one with hyperventilation and one without, the difference between the two readings is less likely to be affected by the changes in blood pressure that occur over the course of a day regardless of whether or not hyperventilation has any effect. We can actually check to see whether something like this is happening. There are 10 pairs or blocks, each comprising two consecutive readings. The two readings within each block are separated by only a few minutes of time, whereas the time interval between blocks is on the order of 45 minutes. If there are significant differences in systolic between blocks, this will constitute evidence that systolic pressure is varying systematically over longer periods of the day. The first thing that we must do is to assign a block number to each row in the data set. We do this by highlighting the first empty column after pulse and applying the SPSS sequence Data/Insert Variable. After changing the name of the new column to block, we type in the sequence 1,1,2,2,3,3,…,10,10. The Data Editor now looks like the image below:

Page 46: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-46

Then we do a graphical check. In the graph below, we plot the blood pressure readings with and without hyperventilation.

The chart was produced via the sequence Graphs/Scatter…, choosing the Overlay format. Then it was fancied up using Chart Editor. The 0-1 labels indicate whether or hyperventilation was applied. Note that the widely divergent pairs occur at different times for systolic and diastolic. As a next step we must create an indicator variable for each of the ten blocks just as we did in analyzing the tonsils data. This time we save you a lot of effort by providing the following modified file, BLOODPRSa.sav:

Page 47: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-47

Note carefully how each block indicator contains two rows of 1’s, corresponding to the pair of measurements that is being considered. The remaining rows for each block variable are equal to zero. We are now ready to regress systolic on hypervent to see if the deep breathing appears to have a lowering effect on blood pressure:

Page 48: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-48

The linear regression above does show a coefficient for hypervent of -2.40 indicating an effect in the right direction, but the standard error, 3.338, of the coefficient is so large that the t-ratio is not significantly different from zero. Note that the residual standard error is 7.46473 and R Square is only 0.028. We have been discussing all along the possibility that the variability of the differences from pair to pair (block to block) may be contributing importantly to the overall variability. Hence the next step is to introduce the block indicators to try to reduce the residual variability and bring out the difference (if any) between hypervent and its absence. In the regression dialog box we enter hypervent and all of the block indicator variables for the independent variables.

Note that the regression routine has dropped b1, the first block indicator, even though it was included in the list of independent variables. This was due to a technical problem in obtaining the least squares solution. For mathematical reasons you cannot include the constant in the equation and at the same time all ten of the indicators. SPSS omitted the first indicator, although it could just as easily have dropped any one of them and obtained the same result for hypervent. Observe that the third and the ninth pair (block) have t-ratios that are almost statistically significant, indicating that the block-to-block variability should not be ignored. Notice also that R Square is very much larger, 0.745 and the residual standard error is now 5.4078, compared to 7.4647 from the regression without block effects. This is a big improvement in the goodness of fit of the model. In spite of the reduction in the standard deviation of the residuals, however, the standard error of

Page 49: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-49

the regression coefficient for hypervent has not gone down all that much. The t-ratio for hypervent is now a bit higher-- -0.992-- but still (in absolute value) much below the cutoff value of |2|. Our conclusion is that although there is a reduction of systolic related to hypervent, even when we control for pair-to-pair variations in the differences, the mean difference is not statistically significant. We leave it for you to try the same analysis on diastolic. You will get much the same results.

Simplicity and Usefulness of Randomized Pairs In the randomized-pair design, the experimental intervention and the "control" are randomized within each successive pair of trials. When we performed the linear regression with the pair indicators included on the right-hand side, the analysis was equivalent to computing the differences in response within each pair -- say, intervention response minus control response -- and then checking whether the mean of these differences is significantly different from zero. If they are significant, we can estimate the intervention effect by the coefficient of hypervent. The uncertainty of the estimate is expressed in the standard error of that coefficient.6 Thus both design and analysis are simple, and there are many potential applications. For example, a salesman might use one of two possible sales approaches A and B for each successive pair of sales calls, with the order of A and B determined by the toss of a coin. Or in trying out a modification of your standard swimming technique, you might use the modification and the standard technique on each pair of laps, the order being randomly determined within each pair. Then for each pair, you take the difference of the times, and analyze these differences, as shown. 5. Intervention, Experimentation, and Statistical Control In quality management there is a common, but erroneous, belief that we must get a process into statistical control before we can intervene or experiment to improve it. All three applications of this chapter are counterexamples:

• In the Medical Audit, there were trend and seasonal effects, both of which are incompatible with statistical control. We dealt with these by regression analysis.

• In the Chemical Yield study, there was a time trend. We dealt with the trend by regression, and

the randomization protected us against any biasing effects of the trend -- or of omitted variables generally -- on the comparison between the two methods.

6See Appendix 2 of this chapter for a different approach in SPSS that will accomplish the same result as the linear regression with block indicators.

Page 50: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-50

• In the blood pressure application, blood pressure varied systematically through the day. The pairing and randomization within pairs made it possible to get a valid measure of the effect of hyperventilation.

Of course, we can intervene if the process is in a state of statistical control, as was illustrated in the Putting Study. But it is not necessary for the process to be in control before we start exploring for systematic ways to improve it.

Page 51: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-51

APPENDIX 1: GRAPHIC DEPICTION OF NEAR-MULTICOLLINEARITY The purpose of the figures below is to help you understand the importance of randomization in the YIELDS.sav example of Chapter 5. The first graph depicts the design as actually implemented in the example. We see the regression plane passing among the data points (i.e., the tops of the spikes) in such a way as to minimize the sum of squared residuals. We can clearly see the positive partial slope with respect to timeseq by following the left edge of the plane. With a sharp eye on the axes for both yield and method, you can even estimate the partial slope with respect to method to be about +20. (Look at the edges of the regression plane that are parallel to the method- axis. As the plane passes from the points where method = 0 to those where method = 1, it is raised by about 20 units of yield.)

Page 52: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-52

In the next graph we have plotted the yield y2, simulated for the hypothetical situation where method was unfortunately assigned by applying A to the first ten values of t and B to the last ten. In this case the average times for the two methods are very different. If you rotate the graph to the right (see the last image below), with respect to the t-axis all of the points for Method B lie above those for Method A. The effect of the high correlation between method and t is to shift the least-squares plane so that the partial slope with respect to t is nearly zero. The partial slope with respect to method is increased, giving an erroneous picture of the “true effect” of method.

Page 53: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-53

In summary, the effect of near-multicollinearity in the independent variables is to give misleading values of the regression coefficients for some or all of the variables.

Page 54: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-54

APPENDIX 2: TWO SAMPLE T-TESTS In Section 14 of Chapter 2 we were introduced to the SPSS sequence Analyze/Compare Means/One-Sample T Test... which we used to test hypotheses about the mean of a process. In this appendix we shall show how two new commands, Independent-Samples T Test and Paired-Samples T Test, can be used to achieve the same results as linear regression in the Blood Pressure example. As in the main chapter, our illustration will involve only the systolic reading, systolic, but you can easily apply the same approach to diastolic. When we invoke Analyze/Compare Means/ Independent-Samples T Test, we get the following dialog window:

In the window at left we have already specified systolic as the variable to be tested and hypervent as the grouping variable, but we see that the term hypervent is followed by (??), indicating that more information is needed. We must click on the Define Groups… button.

We have now informed the routine that the two groups to be compared are defined by a cut point equal to the value 1 for hypervent. After we continue and finally click on OK we obtain this output: Group Statistics

hypervent N Mean Std. Deviation Std. Error

Mean >= 1 10 116.7000 5.90762 1.86815systolic < 1 10 119.1000 8.74897 2.76667

Independent Samples Test

Levene's Test for Equality of Variances t-test for Equality of Means

95% Confidence Interval of the Difference

F Sig. t df

Sig. (2-

tailed)Mean

Difference Std. Error Difference Lower Upper

Equal variances assumed

.636 .436 -.719 18 .481 -2.40000 3.33833 -9.41357 4.61357systolic

Equal variances not assumed

-.719 15.795 .483 -2.40000 3.33833 -9.48443 4.68443

Page 55: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-55

Do not be confused by the complicated display. We have highlighted the essentials for you. All that you need to know now is that the difference in the two sample means is 116.70 - 119.10 = -2.40, exactly the same number that was shown as the regression coefficient for hypervent in our previous regression analysis. In the line entitled Equal variances assumed, you see the t-ratio of 0.719 and p-value of 0.481-- again, the same as in the regression model without the block indicators. We shall now show that if we utilize the SPSS sequence Analyze/Compare Means/Paired-Samples T Test… we obtain a result that is the same as the regression with block indicators. First, however, we have to rearrange the data from the form shown in the original file, BLOODPRS.sav. We show below a modified file called BLOODPRSb.sav that contains only the essential data for this demonstration:

If you compare the spreadsheet on the left with the original file you will see that the new variable sys0 is the same as systolic when hypervent= 0, and sys1 applies when hypervent =1. Note also that the values are correctly paired. We were able to get the file into this form by some judicious sorting, copying, and pasting, but we will not trouble you with the details here.

After executing Analyze/Compare Means/Paired-Samples T Test… we see this dialog window:

Page 56: CHAPTER 5: INTERVENTION ANALYSIS AND RANDOMIZED ...faculty.washington.edu/htamura/qm500/king/Ch05.pdf · 5-3 We are dealing with counting data, but it is apparent that the Poisson

5-56

To put sys1 – sys0 into the Paired Variables: box, we highlighted both variables and clicked on the arrow. Then after clicking on OK we obtain: Paired Samples Statistics

Mean N Std. Deviation Std. Error

Mean sys1 116.7000 10 5.90762 1.86815Pair 1 sys0 119.1000 10 8.74897 2.76667

Paired Samples Test

Paired Differences 95% Confidence

Interval of the Difference

Mean Std.

Deviation

Std. Error Mean Lower Upper t df

Sig. (2-tailed)

Pair 1 sys1 - sys0 -2.40000 7.64780 2.41845 -7.87091 3.07091 -.992 9 .347 We see that the highlighted results above agree exactly with the output from the linear regression that included the block indicator variables.