ng bb 37 multiple regression
DESCRIPTION
TRANSCRIPT
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
National GuardBlack Belt Training
Module 37
Multiple Regression
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
CPI Roadmap – Analyze
Note: Activities and tools vary by project. Lists provided here are not necessarily all-inclusive.
TOOLS•Value Stream Analysis•Process Constraint ID •Takt Time Analysis•Cause and Effect Analysis •Brainstorming•5 Whys•Affinity Diagram•Pareto •Cause and Effect Matrix •FMEA•Hypothesis Tests•ANOVA•Chi Square •Simple and Multiple Regression
ACTIVITIES
• Identify Potential Root Causes
• Reduce List of Potential Root Causes
• Confirm Root Cause to Output Relationship
• Estimate Impact of Root Causes on Key Outputs
• Prioritize Root Causes
• Complete Analyze Tollgate
1.Validate the
Problem
4. Determine Root
Cause
3. Set Improvement
Targets
5. Develop Counter-
Measures
6. See Counter-MeasuresThrough
2. IdentifyPerformance
Gaps
7. Confirm Results
& Process
8. StandardizeSuccessfulProcesses
Define Measure Analyze ControlImprove
8-STEP PROCESS
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
3Multiple Regression
Learning Objectives
Understand how to identify correlation with multiple variables
Learn how to create a mathematical model for the effect of multiple inputs on an output variable
Understand and identify multicollinearity
Understand how to use best subsets to identify the best model
Examine unusual observations to learn more about the data
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
4Multiple Regression
Multiple Regression
In Simple Linear Regression, we had:
Y = B0 + B1X
In Multiple Linear Regression, we have:
Y = B0 + B1X1 + B2X2 + B3X3
We’d like to identify which, if any, of the predictor variables are useful in predicting Y
YX1
X5
X4
X3
X2
Y = f(X)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
5Multiple Regression
When Should I Use Multiple Regression?
The tool depends on the data type. Regression is typically used with a continuous input and a continuous response but may also be used with count or categorical
inputs and outputs.
Continuous AttributeA
ttri
bu
te C
on
tin
uo
us
Independent Variable (X)D
ep
en
de
nt
Va
ria
ble
(Y
)
Regression ANOVA
Logistic
Regression
Chi-Square (2)
Test
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
6Multiple Regression
Basic Steps for Regression Modeling
Process Flowchart
S I P O C
Scatter Plot, Histogram
Correlation, Test Hypothesis
Regression Analysis
STEPS OBJECTIVES KEY QUESTIONS
To identify KPIVs and
KPOVs
Which KPIVs will significantly
improve which KPOVs?
To visualize the data Does it look like there is
C&E relationship?
To qualify the C&E relationship
(Strength, % Variability, P-value)
To quantify the C&E relationship
(Method of Least Squares)
How strong is the C&E
relationship?
What is the prediction
equation?
Residual AnalysisTo validate the model selected
Is there anything suspicious
with the model selected?
1
5
4
3
2
KPIV = Key Process Input Variables KPOV = Key Process Output Variables
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
7Multiple Regression
Example: Production Plant
A chemical engineer is investigating the amount of silver required in the high volume production of contact switches for a new Army radio. Although only a small amount of silver is deposited on the switches, a larger amount is wasted through a multiple step process. She has collected data and would like to develop a prediction model. A-06 Production Plant
Step 1: The variables identified as KPIVs are given below:
X1 = Average temperature of rinse bath (degrees C)
X2 = Speed of reel that feeds the switches through the line (inches/min)
X3 = Thickness of silver deposit (angstroms)
X4 = Water consumed (gallons per day)
Y = Amount of silver consumed (pounds/day)
Source: Applied Regression Analysis, Draper and Smith
What questions would you ask about this data?
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
8Multiple Regression
Visualize the Data!
Step 2:
Visualize the Data
Data file: A-06 Production Plant.mtw
Select Graph>Matrix Plot
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
9Multiple Regression
Looking for relationships between variables...
Step 2: Visualize the Data!
This dialog box comes upfirst
Select Matrix of Plots – Simple
Since we have only one (Y)variable and no groups
Click on OK to go the nextDialog box
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
10Multiple Regression
Double click on all of the variables you want to include in the Matrix, to place them in the Graph variables box
Select Matrix Options to moveon to the next dialog box
Step 2: Visualize the Data!
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
11Multiple Regression
Select Lower left to place allthe graph labels to thelower left of the boxes
Click on OK here and on the previous dialog box to getthe matrix
Step 2: Visualize the Data!
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
12Multiple Regression
Correlation TableThere appear to be some relationships between certain variables and the response.
Temp
Thickness
Water
Amt of Ag
Speed
12
10
814.0
13.5
13.0
170
160
150
656055
21
20
19
12108 14.013.513.0 170160150
Matrix Plot of Temp, Speed, Thickness, Water, Amt of Ag
ResponseVariable
(Y)
Is this
good or
bad?
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
13Multiple Regression
Quantify the Relationships Between Variables
Select Stat>Basic Statistics> Correlation
Step 3: Quantify the relationship
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
14Multiple Regression
Double click on all of the variables you want to include, to place them in the Variables box
Check to display p-values(default setting)
Click on OK to get the Correlation Matrix in yourSession Window
Evaluating coefficients of correlation among predictors...
Correlation Matrix
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
15Multiple Regression
Predictor variable pairwise correlations larger than .5-.7 are signs of trouble ... Multicollinearity. We will explain more shortly.
The TOP number in each pair is the
PearsonCoefficient of Correlation,
(r-Value)
While the BOTTOMnumber is the
p-Value
Correlation Matrix
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
16Multiple Regression
Finding the Regression Equation...
Select: Stat>Regression>Regression
Step 4: Develop a prediction model
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
17Multiple Regression
Double click on C5 Amt of AGand place it in the Response:variable box, then double click on all the variables you want to place in the Predictors:box.
Select Options to go to next dialog box.
Finding the Regression Equation... (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
18Multiple Regression
In this dialog box, the onlything you have to do is checkVariance inflation factors
Click on OK here and on previous dialog box to get theregression analysis in yourSession Window
Finding the Regression Equation... (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
19Multiple Regression
Amt of Ag = 5.7 - 0.0156 Temp + 0.239 Speed + 0.44 Thickness
+ 0.0449 Water
Predictor Coef SE Coef T P VIF
Constant 5.72 10.83 0.53 0.607
Temp -0.01558 0.02616 -0.60 0.563 1.276
Speed 0.2393 0.2644 0.90 0.383 10.997
Thickness 0.443 1.033 0.43 0.675 11.671
Water 0.04495 0.01481 3.04 0.010 1.731
S = 0.412748 R-Sq = 80.9% R-Sq(adj) = 74.5% The P-values indicate whether a particular
predictor is significant in presence of other
predictors in the model
Minitab displays the following regression equation:
Regression Equation
This new model explains 80.9% of response variability
R-Sq (adj) adjusts for degrees of freedom due to variables that have no real value. It
should be used when comparing models
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
20Multiple Regression
Interpreting P-values
The P columns give the significance level for each term in the model
Typically, if a P value is less than or equal to 0.05, the variable is considered significant (i.e., null hypothesis is rejected)
If a P value is greater than 0.10, the term is removed from the model. A practitioner might leave the term in the model, if the P value is within the gray region between these two probability levels
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
21Multiple Regression
Amt of Ag = 5.7 - 0.0156 Temp + 0.239 Speed + 0.44 Thickness
+ 0.0449 Water
Predictor Coef SE Coef T P VIF
Constant 5.72 10.83 0.53 0.607
Temp -0.01558 0.02616 -0.60 0.563 1.276
Speed 0.2393 0.2644 0.90 0.383 10.997
Thickness 0.443 1.033 0.43 0.675 11.671
Water 0.04495 0.01481 3.04 0.010 1.731
S = 0.412748 R-Sq = 80.9% R-Sq(adj) = 74.5%
Regression output in Minitab’s Session Window
Regression Equation
High VIF values are signs of trouble (VIF > 10)
Variance Inflation Factor
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
22Multiple Regression
Problems with Several Predictor Variables
Sometimes the Xs are correlated (dependent). This condition is known as Multicollinearity
Multicollinearity can cause problems (sometimes severe)
Estimates of the coefficients are affected (unstable, inflated variances)
Difficulty isolating the effects of each X
Coefficients depend on which Xs are included in the model
High multicollinearity inflates the standard error estimates, which increases the P values
If case of extreme multicollinearity, Minitab will throw out one term and give you notice
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
23Multiple Regression
Graphical Representation of Multicollinearity
Total Variation
in Y
Variation Explained by
X1
Variation Explained by
X2
• Overlap represents correlation
• X1 and X2 are both correlated with Y
• X1 and X2 are highly correlated
• If X1 is in the model, we don’t need X2, and vice versa
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
24Multiple Regression
VIF
Temp 1.276
Speed 10.997
Thickness 11.671
Water 1.731
Assessing the Degree of Multicollinearity
We use a metric called Variance Inflation Factor (VIF):
Where:
Ri2 is the R2 value you get when you regress Xi against the other X’s
A large Ri2 suggests that a variable is redundant
Rule of Thumb:
Ri2 > 0.9 is a cause for concern (high degree of collinearity) (VIF > 10)
0.8 < Ri2 < 0.9 (moderate degree of collinearity) (VIF > 5)
For the Production Plant data, Minitab gives us:
21
1
iRVIF
Select
Stat>Regression>Regression>Options>Display variance inflation factors
Two VIF’s are a bit large, but in this case with a R-sq. of 80.9%, some multicollinearity can be tolerated
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
25Multiple Regression
Some Cautions About the Coefficients
Remember the prediction equation obtained earlier:
Relative importance of predictors cannot be determined from the size of their coefficients:
The coefficients are scale dependent
The coefficients are influenced by correlation among the predictor variables
If a high degree of multicollinearity exists, even the signs of the coefficients may be misleading
Water0.0449 Thickness 0.44 Speed 0.239 Temp. 0156.07.5 Ag ofAmt
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
26Multiple Regression
Residual Analysis
Select Stat> Regression> Regression
Step 5: Validate the selected model
Is there anything suspicious with
this model?
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
27Multiple Regression
Double click on C5 Amt of AGand place it in the Responsevariable box, then double click on all the variables you want to place in the Predictorsbox
Select Graphs to go to next dialog box
Residual Analysis (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
28Multiple Regression
Select Four in one to get all fourResidual plots on one graph, or you can pick and choose the plotsYou want
Click on OK here and on previousDialog box to get Residual plots
Residual Analysis (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
29Multiple Regression
1.00.50.0-0.5-1.0
99
90
50
10
1
Residual
Pe
rce
nt
N 17
AD 0.249
P-Value 0.705
21.521.020.520.019.5
0.50
0.25
0.00
-0.25
-0.50
Fitted Value
Re
sid
ua
l
0.60.40.20.0-0.2-0.4-0.6
4
3
2
1
0
Residual
Fre
qu
en
cy
161412108642
0.50
0.25
0.00
-0.25
-0.50
Observation Order
Re
sid
ua
l
Normal Probability Plot Versus Fits
Histogram Versus Order
Residual Plots for Amt of Ag
Not too bad overall…
If you want to see the value for any observation, just hold your cursor over that point
Residual Analysis (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
30Multiple Regression
How to Address Multicollinearity
Eliminate one or more input variables
We’ll look at a technique called Best Subsets Regression
Collect additional data
Use process knowledge to determine the principal relationship
Use DOE to further assess the multicollinearity
If neither are significant then eliminate both from the analysis
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
31Multiple Regression
Best Subsets Regression
Rather than relying on the p-values alone, the computer looks at all possible combinations of variables and prints the resulting model characteristics
Statistics like adjusted R-Sq and MSError will improve as important model terms are added, then worsen as “junk” terms are added to the model
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
32Multiple Regression
Best Subsets Regression Considerations
Objective: We want to select a model with predictive accuracy and minimum multicollinearity
Seek compromise between:
Overfitting (including model terms with only marginal, or no, contribution)
Underfitting (ignoring or deleting relatively important model terms)
What are some problems with overfitting?
What are some problems with underfitting?overfit underfit
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
33Multiple Regression
Best Subsets Regression
Evaluating Candidate Models
Four things to look at when evaluating candidate models:
1. R2 (large R2 is desired, although R2 increases as we add more predictors to the model, so this should only be used for comparing models with the same number of terms)
2. Adjusted R2 (large is desired)
3. Mallows Cp statistic (small Cp desired, close to the number of terms in the model)
4. s (the estimate of the standard deviation around the regression)
Generally, the best three models are selected and checked for
significance of all factors and residual assumptions
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
34Multiple Regression
More on the Mallows C-p Statistic
In practice, the minimum number of parameters needed in the model is when the Mallows’ C-p statistic is a minimum
Rule of Thumb:
We want C-p number of input variables
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
35Multiple Regression
Best Subsets Regression
Select Stat> Regression>Best Subsets
Minitab data set: Production Plant
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
36Multiple Regression
Best Subsets Regression (Cont.)
Enter Response variable
Enter Predictor variables(Input Variables)
Click on OK to get analysisin Session Window
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
37Multiple Regression
Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water
Response is Amt of Ag
T
h
i
c
S k W
T p n a
e e e t
Mallows m e s e
Vars R-Sq R-Sq(adj) Cp S p d s r
1 64.4 62.0 9.4 0.50387 X
1 62.3 59.8 10.7 0.51836 X
2 80.0 77.2 1.5 0.39047 X X
2 78.8 75.8 2.3 0.40200 X X
3 80.6 76.1 3.2 0.39959 X X X
3 80.3 75.8 3.4 0.40237 X X X
4 80.9 74.5 5.0 0.41275 X X X X
What Model(s) are the best candidates?
Best Subsets Regression (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
38Multiple Regression
Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water
Response is Amt of Ag
T
h
i
c
S k W
T p n a
e e e t
Mallows m e s e
Vars R-Sq R-Sq(adj) Cp S p d s r
1 64.4 62.0 9.4 0.50387 X
1 62.3 59.8 10.7 0.51836 X
2 80.0 77.2 1.5 0.39047 X X
2 78.8 75.8 2.3 0.40200 X X
3 80.6 76.1 3.2 0.39959 X X X
3 80.3 75.8 3.4 0.40237 X X X
4 80.9 74.5 5.0 0.41275 X X X X
R-Sq: Look for the highest value when comparing models with the same number of input variables
Best Subsets Regression (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
39Multiple Regression
Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water
Response is Amt of Ag
T
h
i
c
S k W
T p n a
e e e t
Mallows m e s e
Vars R-Sq R-Sq(adj) Cp S p d s r
1 64.4 62.0 9.4 0.50387 X
1 62.3 59.8 10.7 0.51836 X
2 80.0 77.2 1.5 0.39047 X X
2 78.8 75.8 2.3 0.40200 X X
3 80.6 76.1 3.2 0.39959 X X X
3 80.3 75.8 3.4 0.40237 X X X
4 80.9 74.5 5.0 0.41275 X X X X
R-Sq (adj): Look for the highest value when comparing models with different number
of input variables
Best Subsets Regression (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
40Multiple Regression
Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water
Response is Amt of Ag
T
h
i
c
S k W
T p n a
e e e t
Mallows m e s e
Vars R-Sq R-Sq(adj) Cp S p d s r
1 64.4 62.0 9.4 0.50387 X
1 62.3 59.8 10.7 0.51836 X
2 80.0 77.2 1.5 0.39047 X X
2 78.8 75.8 2.3 0.40200 X X
3 80.6 76.1 3.2 0.39959 X X X
3 80.3 75.8 3.4 0.40237 X X X
4 80.9 74.5 5.0 0.41275 X X X X
Cp: Look for models where Cp is small and close to the number of
input variables in the model
Best Subsets Regression (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
41Multiple Regression
Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water
Response is Amt of Ag
T
h
i
c
S k W
T p n a
e e e t
Mallows m e s e
Vars R-Sq R-Sq(adj) Cp S p d s r
1 64.4 62.0 9.4 0.50387 X
1 62.3 59.8 10.7 0.51836 X
2 80.0 77.2 1.5 0.39047 X X
2 78.8 75.8 2.3 0.40200 X X
3 80.6 76.1 3.2 0.39959 X X X
3 80.3 75.8 3.4 0.40237 X X X
4 80.9 74.5 5.0 0.41275 X X X X
S: We want S, the estimate of the standard deviation about the regression, to be as small
as possible
Best Subsets Regression (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
42Multiple Regression
Once the Candidate Models Are Identified
Evaluate the candidate models under a “microscope”
Outliers
High leverage
Influential observations
Residuals
Prediction quality
Once a model has been selected, find the new regression equation
Test its predictive capability for observations NOT originally used in the modeling
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
43Multiple Regression
Regression with Reduced Model
We select the best model with two variables, Speed & Water, and run Minitab again to obtain the new regression equation:
Select Stat>Regression>Regression
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
44Multiple Regression
Regression with Reduced Model (Cont.)
Enter Amt of Ag as theResponse
Enter only Speed and Wateras Predictors
Click on OK to get analysisin Session Window
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
45Multiple Regression
Regression with Reduced Model (Cont.)
Amt of Ag = 9.92 + 0.357 Speed + 0.0425 Water
Predictor Coef SE Coef T P
Constant 9.919 1.694 5.86 0.000
Speed 0.35689 0.08544 4.18 0.001
Water 0.04253 0.01206 3.53 0.003
S = 0.3905 R-Sq = 80.0% R-Sq(adj) = 77.2%
Session window of Minitab yields the following regression equation for the reduced model:
Amt of Ag = 5.7 - 0.0156 Temp. + 0.239 Speed
+ 0.44 Thickness + 0.0449 Water
Predictor Coef SE Coef T P
Constant 5.72 10.83 0.53 0.607
H20 Temp -0.01558 0.02616 -0.60 0.563
Speed 0.2393 0.2644 0.90 0.383
Thick. 0.443 1.033 0.43 0.675
Water 0.04495 0.01481 3.04 0.010
S = 0.4127 R-Sq = 80.9% R-Sq(adj) = 74.5%
…to compare with the previous model:
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
46Multiple Regression
Session window of Minitab also gives us the following output:
Unusual Observations
Obs Speed Amt of A Fit SE Fit Residual St Resid
3 11.5 21.0000 20.3784 0.2477 0.6216 2.06R
R denotes an observation with a large standardized residual
An unusual observation means a large standard residual
Let’s see what would happen if we eliminated such an observation
from our collected data!
Unusual Observations
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
47Multiple Regression
Without the Unusual Observation, the Session window of Minitab yields the following regression equation:
Amt of Ag = 8.61 + 0.237 Speed + 0.0577 Water
Predictor Coef SE Coef T P
Constant 8.610 1.567 5.49 0.000
Speed 0.23698 0.08960 2.64 0.020
Water 0.05775 0.01226 4.71 0.000
S = 0.3383 R-Sq = 85.0% R-Sq(adj) = 82.7%
…to compare with the regression equation of our previous reduced model
Amt of Ag = 9.92 + 0.357 Speed + 0.0425 Water
Predictor Coef SE Coef T P
Constant 9.919 1.694 5.86 0.000
Speed 0.35689 0.08544 4.18 0.001
Water 0.04253 0.01206 3.53 0.003
S = 0.3905 R-Sq = 80.0% R-Sq(adj) = 77.2%
R-Sq goes up a little because we’ve gotten rid of “noise” in the model
Impact of the Unusual Observation
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
48Multiple Regression
Takeaways
Regression analysis can be used with historical data as well data from designed experiments to build prediction models
Care must be exercised when using historical data
Correlation does not imply a cause and effect relationship
There may be serious problems with multicollinearity and high leverage observations
There are several diagnostic tools available to evaluate regression models:
Fit: R2, adjusted R2, Cp, S
Unusual observations: residual plots, leverage, CooksD
Multicollinearity: VIFs (Variance Inflation Factors)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
49Multiple Regression
Considerations in Regression
Set goals before doing the analysis (what do you want to learn, how well do you need to predict, etc.).
Gather enough observations to adequately measure error and check the model assumptions.
Make sure that the sample of data is representative of the population.
Excessive measurement error of the inputs (Xs) creates uncertainty in the estimated coefficients, predictions, etc.
Be sure to collect data on all potentially important explanatory variables.
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
50Multiple Regression
Regression Checklist
Scatterplots (Y vs. X)
Histograms and/or Boxplots of Ys and Xs
Coefficients
Significance (p < .05 - .10)
R2 and adjusted R2
S
Residuals (no obvious pattern)
Unusual Y values (standardized residuals > 2)
Unusual X values (leverage > 2p/n)
Overfitting vs. underfitting (C-p number of input variables in model)
Multicollinearity (VIF > 5-10)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
What other comments or questions
do you have?
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
52Multiple Regression
References
Neter, Wasserman, and Kutner, Applied Linear Regression Models, Irwin, 1989
Draper and Smith, Applied Regression Analysis, Wiley, 1981
Schulman, Robert S., Statistics in Plain English, Chapman and Hall, 1992.
Gunst and Mason, Regression Analysis and its Application, Marcel Dekker, 1980
Myers, Raymond H., Classical and Modern Regression with Applications, Duxbury, 1990
Dielman, Applied Regression Analysis for Business and Economics, Duxbury, 1991
Hosmer and Lemeshow, Applied Logistic Regression, Wiley, 1989
Iglewicz and Hoaglin, How to Detect and Handle Outliers, ASQ Press
Crocker, Douglas C., How to use Regression Analysis in Quality Control, ASQ Press
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
National GuardBlack Belt Training
APPENDIX
Additional Exercises Anthony’s Pizza
Customer Satisfaction
A Study of Supervisor Performance
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
54Multiple Regression
Additional Practice Example:
Anthony’s Pizza
We have received Voice of the Customer feedback telling us that customers are dissatisfied if we cannot accurately predict the time of their pizza delivery when it is beyond the 30 minute target
We would like to develop a model so that when the customer calls, we can accurately predict delivery time
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
55Multiple Regression
Additional Practice Example:
Six Sigma Pizza
Our Minitab data can be found in the file Multiple Regression - Pizza.mpj
Based on the data that we have collected, we are going to study the effects of total pizzas ordered, defects, and incorrect order on delivery time
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
56Multiple Regression
Additional Practice Exercise:
Customer Satisfaction
Bob Black Belt would like to get a better understanding of the customer satisfaction data
Use the data provided in the Minitab file A-06 Customer Satisfaction Data.mtw to create a Regression Model to predict Overall Satisfaction
Each row of data is a monthly average of how customers rated the services on a scale of 1-10. For example, in January, the average
of customer ratings for Staff Responsiveness was a 7.9.
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
57Multiple Regression
Additional Practice Exercise:
Customer Satisfaction (Cont.)
Consider Staff Responsiveness, Check-out Speed, Frequent Guest Program, and Problems Resolved as possible inputs that could be used to predict Overall Satisfaction.
First, study correlation with a Matrix Plot and Correlation Table
Next, create the initial Regression Model
Find the best combination of inputs with Best Subsets
Finally, run the reduced Regression Model
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
58Multiple Regression
Additional Practice Exercise:
A Study of Supervisor Performance
A recent survey of clerical employees in a large financial organization included questions related to employee satisfaction with their supervisors. The company was interested in any relationships between specific supervisor characteristics and overall satisfaction with supervisors as perceived by the employees,
Y = Overall rating of the job being done by the supervisor
X1 = Handles employee complaints
X2 = Does not allow special privileges
X3 = Provides opportunity to learn new things
X4 = Raises based on performance
X5 = Too critical of poor performance
X6 = Rate of advancing to better jobs (employee’s perception of their own advancement rate)
Source: Regression Analysis by Example, Chatterjee and Price
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
59Multiple Regression
Additional Practice Exercise:
A Study of Supervisor Performance
The survey responses were on a scale of 1-5
For purposes of analysis, a score of 1 or 2 was considered “favorable”, while a score of 3, 4, or 5 was considered “unfavorable”
Data was collected from 30 departments, selected randomly form the organization. Each department had approximately 35 employees with one supervisor
For each department, the data was aggregated and the data recorded was the percent favorable for each item
Data file is A-06 Attitude.mtw
Questions:Can we predict the overall supervisor rating using this data?
What variable(s) have the strongest correlation with the supervisor rating?
Are there any unusual observations?
Comments on the data?