sma 6304 / mit 2.853 / mit 2.854 manufacturing systems lecture 10: data and regression analysis...
Post on 19-Jan-2018
216 Views
Preview:
DESCRIPTION
TRANSCRIPT
SMA 6304 / MIT 2.853 / MIT 2.854 Manufacturing Systems
Lecture 10: Data and Regression Analysis
Lecturer: Prof. Duane S. Boning
Copyright 2003© Duans S. Boning. 1
Agenda
1. Comparison of Treatments (One Variable) • Analysis of Variance (ANOVA) 2. Multivariate Analysis of Variance • Model forms 3. Regression Modeling • Regression fundamentals • Significance of model terms • Confidence intervals
Copyright 2003© Duans S. Boning. 2
Is Process B Better Than Process A?
Copyright 2003© Duans S. Boning. 3
Two Means with Internal Estimate of Variance
Copyright 2003© Duans S. Boning. 4
Comparison of Treatments
Copyright 2003© Duans S. Boning. 5
• Consider multiple conditions (treatments, settings for some variable) – There is an overall mean μ and real “effects” or deltas between
conditions τ. – We observe samples at each condition of interest • Key question: are the observed differences in mean “significant”?
- Typical assumption (should be checked): the underlying variances are all the same – usually an unknown value (σ02)
Steps/Issues in Analysis of Variance
Copyright 2003© Duans S. Boning. 6
1. Within group variation – Estimates underlying population variance 2. Between group variation – Estimate group to group variance
3. Compare the two estimates of variance – If there is a difference between the different treatments, then the between group variation estimate will be inflated compared to the within group estimate – We will be able to establish confidence in whether or not observed differences between treatments are significant Hint: we’ll be using F tests to look at ratios of variances
(1) Within Group Variation
Copyright 2003© Duans S. Boning. 7
• Assume that each group is normally distributed and shares a common variance• SS= sum of square deviations wgroup (there are k groups)
•
• Estimate of wthin group variance in tgroup (ust variance formula)
• Pool these (across different conditions) to get estimate of common thin group variance:
• This is the wthin group “mean square” variance estimate)
(2) Between Group Variation
Copyright 2003© Duans S. Boning. 8
• We will be testing hypothesis = … = • If all the means are in fact equal, then a 2 estimate could be
formed based on the observed differences between group means:
• If all the treatments in fact have different means, then estimates something larger:
(3) Compare Variance Estimates
Copyright 2003© Duans S. Boning. 9
• We now have two different posses for s depending on whether the observed sample mean differences are “real” or are just occurring by chance (by sampling) •
• Use statistic to see if the ratios of these variances are likely to have occurred by chance!
• Formal test for significance:
(4) Compute Significance Level
Copyright 2003© Duans S. Boning. 10
• Calculate observed ratio (wth appropriate degrees of freedom in numerator and denominator) • distribution to find how likely a ratio this large is to have occurred by chance alone -This is our “signifcance level” -If
• then we say that the mean differences or treatment effects are sgnificant to (1-)100% confidence or
better
(5) Variance Due to Treatment Effects
Copyright 2003© Duans S. Boning. 11
• We also want to estimate the sum of squared deviations from the grand mean among all
samples:
(6) Results: The ANOVA Table
Copyright 2003© Duans S. Boning. 12
source of sum of degrees Variation squares of mean square freedom
Between Treatments
Within treatments
Total about the grand average
Example: Anova
Copyright 2003© Duans S. Boning. 13
ANOVA – Implied Model
Copyright 2003© Duans S. Boning. 14
• The ANOVA approach assumes a smple mathematical model: • Where is the treatment mean (for treatment type t)• And is the treatment effect • With being zero mean normal residuals ~N• Checks -Plot residuals against time order -Examne distribution of residuals: should be IID, Normal -Plot residuals vs. estimates -Plot residuals vs. other variables of interest
MANOVA – Two Dependencies
Copyright 2003© Duans S. Boning. 15
• Can extend to two (or more) variables of interest. MANOVA assumes a mathematical model, again simply capturing the means (or treatment offsets) for each discrete variable level:
• Assumes that the effects from the two variables are additive
Example: Two Factor MANOVA
Copyright 2003© Duans S. Boning. 16
• Two LPCVD depositon tube types, three gas suppers. Does supper matter in average particle counts on wafers? Experiment: 3 lots on each tube, for each gas; report average # particles added
MANOVA – Two Factors with Interactions
Copyright 2003© Duans S. Boning. 17
• May be interaction: not simpy additive – effects may depend synergistically on both factors:
An effect that depends on both t & factors simultaneously
t = first factor = 1,2, … k (k = #eves of first factori = second factor = 1,2, … n (n = # levels of second factor)j = replication = 1,2, … m (m = # replications at t,th combination of factor levels
• Can split out the model more explicitly…
Estimate by:
(6) Results: The ANOVA Table
Copyright 2003© Duans S. Boning. 18
source of sum of degrees Variation squares of mean square freedom
Between levels of factor 1 (T)
Between levels of factor 2 (B)
Interaction
Within Groups(Error)
Total about the grand
average
Measures of Model Goodness – R
Copyright 2003© Duans S. Boning. 19
• Goodness of fit –R
-Think of this as (1 – varance remaining in the residual) Recall
• Adjusted R -For “fair” comparison between models wth different numbers of coeffi
cients, an alternatve is often used
-Think of this as the fraction of squared deviationsfrom the grand average) in the data which is captured by the
-Queston consdered: how much better does the model do thatust using the grand average?
Regression Fundamentals
Copyright 2003© Duans S. Boning. 20
• Use least square error as measure of goodness to estimate coefficients in a model
• One parameter model: – Model form – Squared error – Estimation using normal equations – Estimate of experimental error – Precision of estimate: variance in b – Confidence interval for β – Analysis of variance: significance of b – Lack of fit vs. pure error • Polynomial regression
Regression Fundamentals
Copyright 2003© Duans S. Boning. 21
• We use least-squares to estimate
coefficients in typical regression models
• One-Parameter Model:
• Goais to estimate th “best” b
• How define “best”?
-That b whch mzes sum of squared
error between predicton and data
-The residua sum of squares (for the
best estimate) is
Regression Fundamentals
Copyright 2003© Duans S. Boning. 22
• Least squares estimation via normal equations -For linear problems, we need not calcuate SS( ); rather, drect solution for b is possib -Recognize that vector of resduals will be normal to vector of x vaues at the least squares estimate
• Estimate of experimental error -Assumng model structure is adequate, estimate scan be obtained:
Precision of Estimate: Variance in b
Copyright 2003© Duans S. Boning. 23
• We can calculate the variance in our estimate of the slope, b:
• Why?
Confidence Interval for β
Copyright 2003© Duans S. Boning. 24
• Once we have the standard error in b, we can calculate confidence intervals to some desred (1-)100% level of confidence
•Analysis of variance
-Test hypothess:
-If confdence interval for includes 0, then β not significant
-Degrees of freedom (need in order to use t distribution
p = # parameters estmated by least squares
Example Regression
Copyright 2003© Duans S. Boning. 25
• Note that this smple model assumes an intercept of
zero – model must go through origin
• We wll relax this requirement soon
Age Income 8 6.16 22 9.88 35 14.35 40 24.06 57 30.34 73 32.1778 42.1887 43.2398 48.76
Lack of Fit Error vs. Pure Error
Copyright 2003© Duans S. Boning. 26
•Sometimes we have replicated data
-E.g. multiple runs at same x values in a designed experiment
•We can decompose the resdual error contributons
Where
= resdual sum of squares error
= lack of ft squared error
= pure replicate error•This alows us to TEST for lack of fit -By “lack of fit” we mean evidence that the linear model form is inadequate
Regression: Mean Centered Models
Copyright 2003© Duans S. Boning. 27
Model form
Model form
Regression: Mean Centered Models
Copyright 2003© Duans S. Boning. 28
•Confidence Intervals
•Our confidence interval on y widens as we get further from the center of our data1
Polynomial Regression
Copyright 2003© Duans S. Boning. 29
•We may bellwe thai a hlgher order model strudurm applk. Pdynomlal forms are also Ilnem in the coendents and mm be M wlth least squares
•Example: Growth rate data
Regression Example: Growth Rate Data
Copyright 2003© Duans S. Boning. 30
R6ph~ated ata provKles opportunity to check for ladc of fit
Growth Rate - First Order Model
Copyright 2003© Duans S. Boning. 31
•Mean significant, but linear term not
•Clear evidence of lack of fit
Growth Rate - Second Order Model
Copyright 2003© Duans S. Boning. 32
•No evidence of lack of fit
•Quadratic term significant
Polynomial Regression In Excel
Copyright 2003© Duans S. Boning. 33
•Create additional input columns for each input
•Use "Data Analysis" and "Regression" tool
Polynomial Regression
Copyright 2003© Duans S. Boning. 34
• Generated using JMP package
Summary
Copyright 2003© Duans S. Boning. 35
• Comparison of Treatments – ANOVA
• Multivariate Analysis of Variance
• Regression Modeling
Next Time
• Time Series Models
• Forecasting
top related