applied business forecasting and planningpinar/courses/vbm687/lectures/regression.pdf · epi...
TRANSCRIPT
![Page 1: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/1.jpg)
Regression
![Page 2: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/2.jpg)
![Page 3: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/3.jpg)
![Page 4: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/4.jpg)
![Page 5: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/5.jpg)
![Page 6: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/6.jpg)
![Page 7: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/7.jpg)
![Page 8: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/8.jpg)
![Page 9: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/9.jpg)
EPI 809/Spring 2008 9
What is a Model?
1. Often Describe Relationship between Variables
2. Types- Deterministic Models (no randomness)
- Probabilistic Models (with randomness)
![Page 10: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/10.jpg)
EPI 809/Spring 2008 10
Deterministic Models
1. Hypothesize Exact Relationships
2. Suitable When Prediction Error is Negligible
3. Example: Body mass index (BMI) is measure of body fat based
• BMI = Weight in Kilograms(Height in Meters)2
![Page 11: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/11.jpg)
EPI 809/Spring 2008 11
Probabilistic Models
1. Hypothesize 2 Components
• Deterministic
• Random Error
2. Example: Systolic blood pressure of newborns Is 6 Times the Age in days + Random Error
• SBP = 6 x age(d) + • Random Error May Be Due to Factors Other Than age in days
(e.g. Birthweight)
![Page 12: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/12.jpg)
Simple Regression
• Simple regression analysis is a statistical tool that gives us the ability to estimate the mathematical relationship between a dependent variable (usually called y) and an independent variable (usually called x).
• The dependent variable is the variable for which we want to make a prediction.
• While various non-linear forms may be used, simple linear regression models are the most common.
![Page 13: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/13.jpg)
Introduction
• The primary goal of quantitative analysis is to use current information about a phenomenon to predict its future behavior.
• Current information is usually in the form of a set of data.
• In a simple case, when the data form a set of pairs of numbers, we may interpret them as representing the observed values of an independent (or predictor or explanatory) variable X and a dependent ( or response or outcome) variable Y.
lot size Man-hours
30 73
20 50
60 128
80 170
40 87
50 108
60 135
30 69
70 148
60 132
![Page 14: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/14.jpg)
Introduction
• The goal of the analyst who studies the data is to find a functional relation
between the response variable y and the predictor variable x.
)(xfy
0
20
40
60
80
100
120
140
160
180
0 10 20 30 40 50 60 70 80 90
Ma
n-H
ou
r
Lot size
Statistical relation between Lot size and Man-Hour
![Page 15: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/15.jpg)
Pictorial Presentation of Linear Regression Model
![Page 16: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/16.jpg)
EPI 809/Spring 2008 16
Linear Regression Model
![Page 17: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/17.jpg)
![Page 18: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/18.jpg)
Assumptions
• Linear regression assumes that… • 1. The relationship between X and Y is linear
• 2. Y is distributed normally at each value of X
• 3. The variance of Y at every value of X is the same (homogeneity of variances)
• 4. The observations are independent
![Page 19: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/19.jpg)
EPI 809/Spring 2008 19
Y
Y = mX + b
b = Y-intercept
X
Change
in Y
Change in X
m = Slope
Linear Equations
© 1984-1994 T/Maker Co.
![Page 20: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/20.jpg)
• 1. Relationship Between Variables Is a Linear Function
Y Xi i i 0 1
Linear Regression Model
Dependent
(Response)
Variable
(e.g., CD+ c.)
Independent (Explanatory) Variable (e.g., Years s. serocon.)
Population Slope
Population Y-Intercept
Random Error
![Page 21: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/21.jpg)
Meaning of Regression Coefficients
• The values of the regression parameters 0, and 1 are not known. We estimate them from data.
• 1 indicates the change in the mean response per unit increase in X.
• General regression model
1. 0, and 1 are parameters
2. X is a known constant
3. Deviations are independent N(o, 2)
![Page 22: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/22.jpg)
EPI 809/Spring 2008 22
Y
X
Population Linear Regression Model
Y Xi i i 0 1
iXYE 10
Observed
value
Observed value
i = Random error
![Page 23: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/23.jpg)
EPI 809/Spring 2008 23
Estimating Parameters:Least Squares Method
![Page 24: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/24.jpg)
EPI 809/Spring 2008 24
0
20
40
60
0 20 40 60
X
Y
Scatter plot
• 1. Plot of All (Xi, Yi) Pairs
• 2. Suggests How Well Model Will Fit
![Page 25: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/25.jpg)
EPI 809/Spring 2008 25
Thinking Challenge
How would you draw a line through the points? How do you determine which line ‘fits best’?
0
20
40
60
0 20 40 60
X
Y
![Page 26: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/26.jpg)
EPI 809/Spring 2008 26
Thinking Challenge
How would you draw a line through the points? How do you determine which line ‘fits best’?
0
20
40
60
0 20 40 60
X
YSlope changed
Intercept unchanged
![Page 27: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/27.jpg)
EPI 809/Spring 2008 27
Thinking Challenge
How would you draw a line through the points? How do you determine which line ‘fits best’?
0
20
40
60
0 20 40 60
X
Y
Slope unchanged
Intercept changed
![Page 28: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/28.jpg)
EPI 809/Spring 2008 28
Thinking Challenge
How would you draw a line through the points? How do you determine which line ‘fits best’?
0
20
40
60
0 20 40 60
X
YSlope changed
Intercept changed
![Page 29: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/29.jpg)
What is the best fitting line
![Page 30: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/30.jpg)
Prediction Error
![Page 31: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/31.jpg)
EPI 809/Spring 2008 31
Least Squares
• 1. ‘Best Fit’ Means Difference Between Actual Y Values & Predicted Y Values Are a Minimum. But Positive Differences Off-Set Negative. So square errors!
• 2. LS Minimizes the Sum of the Squared Differences (errors) (SSE)
n
i
i
n
i
ii YY1
2
1
2
ˆˆ
![Page 32: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/32.jpg)
EPI 809/Spring 2008 32
Least Squares Graphically
2
Y
X
1
3
4
^^
^^
Y X2 0 1 2 2
Y Xi i 0 1
LS minimizes i
i
n2
1
12
22
32
42
![Page 33: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/33.jpg)
How to estimate parameters
![Page 34: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/34.jpg)
Estimating the intercept and slope: least squares estimation
** Least Squares EstimationA little calculus….What are we trying to estimate? β, the slope, from
What’s the constraint? We are trying to minimize the squared distance (hence the “least squares”) between the observations themselves and the predicted values , or (also called the “residuals”, or left-over unexplained variability)
Differencei = yi – (βx + α) Differencei2 = (yi – (βx + α)) 2
Find the β that gives the minimum sum of the squared differences. How do you maximize a function? Take the derivative; set it equal to zero; and solve. Typical max/min problem from calculus….
From here takes a little math trickery to solve for β…
...0))((2
)))(((2))((
1
2
11
2
n
i
iiii
n
i
iii
n
i
ii
xxxy
xxyxyd
d
![Page 35: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/35.jpg)
The standard error of Y given X is the average variability around the regression line at any given value of X. It is assumed to be equal at all values of X.
Sy/x
Sy/x
Sy/x
Sy/x
Sy/x
Sy/x
![Page 36: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/36.jpg)
![Page 37: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/37.jpg)
C A
B
A
yi
x
y
yi
C
B
*Least squares estimation gave us the line (β) that minimized C2
ii xy
y
A2 B2 C2
SStotal
Total squared distance of observations from naïve mean of yTotal variation
SSregDistance from regression line to naïve mean of y
Variability due to x (regression)
SSresidualVariance around the regression line
Additional variability not explained
by x—what least squares method aims
to minimize
n
i
ii
n
i
n
i
ii yyyyyy1
2
1 1
22 )ˆ()ˆ()(
Regression Picture
R2=SSreg/SStotal
![Page 38: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/38.jpg)
Regression Line
• If the scatter plot of our sample data suggests a linear relationship between two variables i.e.
we can summarize the relationship by drawing a straight line on the plot.
• Least squares method give us the “best” estimated line for our set of sample data.
xy 10
![Page 39: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/39.jpg)
Regression Line
• We will write an estimated regression line based on sample data as
• The method of least squares chooses the values for b0, and b1 to minimize the sum of squared errors
xbby 10ˆ
2
1
10
1
2)ˆ(
n
i
n
i
ii xbbyyySSE
![Page 40: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/40.jpg)
Regression Line
• Using calculus, we obtain estimating formulas:
or
n
i
n
i
ii
n
i
n
i
n
i
iiii
n
i
i
n
i
ii
xxn
yxyxn
xx
yyxx
b
1 1
22
1 1 1
1
2
11
)()(
))((
xbyb 10
x
y
S
Srb 1
![Page 41: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/41.jpg)
Estimation of Mean Response
• Fitted regression line can be used to estimate the mean value of y for a given value of x.
• Example
• The weekly advertising expenditure (x) and weekly sales (y) are presented in the following table.
y x
1250 41
1380 54
1425 63
1425 54
1450 48
1300 46
1400 62
1510 61
1575 64
1650 71
![Page 42: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/42.jpg)
Point Estimation of Mean Response
• From previous table we have:
• The least squares estimates of the regression coefficients are:
81875514365
3260456410 2
xyy
xxn
8.10)564()32604(10
)14365)(564()818755(10
)( 2221
xxn
yxxynb
828)4.56(8.105.14360 b
![Page 43: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/43.jpg)
Point Estimation of Mean Response
• The estimated regression function is:
• This means that if the weekly advertising expenditure is increased by $1 we would expect the weekly sales to increase by $10.8.
eExpenditur 8.10828Sales
10.8x828y
![Page 44: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/44.jpg)
Point Estimation of Mean Response
• Fitted values for the sample data are obtained by substituting the x value into the estimated regression function.
• For example if the advertising expenditure is $50, then the estimated Sales is:
• This is called the point estimate (forecast) of the mean response (sales).
1368)50(8.10828 Sales
![Page 45: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/45.jpg)
Linear correlation and linear regression
![Page 46: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/46.jpg)
Covariance
1
))((
),(cov 1
n
YyXx
yx
n
i
ii
![Page 47: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/47.jpg)
cov(X,Y) > 0 X and Y are positively correlated
cov(X,Y) < 0 X and Y are inversely correlated
cov(X,Y) = 0 X and Y are independent
Interpreting Covariance
![Page 48: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/48.jpg)
Correlation coefficient
Pearson’s Correlation Coefficient is standardized covariance (unitless):
yx
yxariancer
varvar
),(cov
![Page 49: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/49.jpg)
Correlation
• Measures the relative strength of the linear relationship between two variables
• Unit-less
• Ranges between –1 and 1
• The closer to –1, the stronger the negative linear relationship
• The closer to 1, the stronger the positive linear relationship
• The closer to 0, the weaker any positive linear relationship
![Page 50: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/50.jpg)
Scatter Plots of Data with Various Correlation Coefficients
Y
X
Y
X
Y
X
Y
X
Y
X
r = -1 r = -.6 r = 0
r = +.3r = +1
Y
Xr = 0
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
![Page 51: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/51.jpg)
Y
X
Y
X
Y
Y
X
X
Linear relationships Curvilinear relationships
Linear Correlation
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
![Page 52: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/52.jpg)
Y
X
Y
X
Y
Y
X
X
Strong relationships Weak relationships
Linear Correlation
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
![Page 53: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/53.jpg)
Linear Correlation
Y
X
Y
X
No relationship
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
![Page 54: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/54.jpg)
Calculating by hand…
1
)(
1
)(
1
))((
varvar
),(covˆ
1
2
1
2
1
n
yy
n
xx
n
yyxx
yx
yxariancer
n
i
i
n
i
i
n
i
ii
![Page 55: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/55.jpg)
Simpler calculation formula…
yx
xy
n
i
i
n
i
i
n
i
ii
n
i
i
n
i
i
n
i
ii
SSSS
SS
yyxx
yyxx
n
yy
n
xx
n
yyxx
r
1
2
1
2
1
1
2
1
2
1
)()(
))((
1
)(
1
)(
1
))((
ˆ
yx
xy
SSSS
SSr ˆ
Numerator of covariance
Numerators of variance
![Page 56: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/56.jpg)
Least Square estimation
Slope (beta coefficient) =
)(
),(ˆxVar
yxCov
),( yx
x-yˆ :Calculate Intercept=
Regression line always goes through the point:
![Page 57: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/57.jpg)
Relationship with correlation
y
x
SD
SDr ˆ
In correlation, the two variables are treated as equals. In regression, one variable is considered independent (=predictor) variable (X) and the other the dependent (=outcome) variable Y.
![Page 58: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/58.jpg)
Residual Analysis: check assumptions
• The residual for observation i, ei, is the difference between its observed and predicted value
• Residuals are highly useful for studying whether a given regression model is appropriate for the data at hand.
• Check the assumptions of regression by examining the residuals
• Examine for linearity assumption
• Examine for constant variance for all levels of X (homoscedasticity)
• Evaluate normal distribution assumption
• Evaluate independence assumption
• Graphical Analysis of Residuals
• Can plot residuals vs. X
iii YYe ˆ
![Page 59: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/59.jpg)
Residual = observed - predicted
14ˆ
34ˆ
48
ii
i
i
yy
y
y
X=95 nmol/L
34
![Page 60: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/60.jpg)
Residual Analysis for Linearity
Not Linear Linear
x
resid
ua
ls
x
Y
x
Y
x
resid
ua
lsSlide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
![Page 61: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/61.jpg)
Residual Analysis for Homoscedasticity
Non-constant variance Constant variance
x x
Y
x x
Yre
sid
ua
ls
resid
uals
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
![Page 62: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/62.jpg)
Residual Analysis for Independence
Not Independent
Independent
X
Xresid
ua
ls
resid
uals
X
resid
ua
ls
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
![Page 63: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/63.jpg)
Example: weekly advertising expenditure
y x y-hat Residual (e)
1250 41 1270.8 -20.8
1380 54 1411.2 -31.2
1425 63 1508.4 -83.4
1425 54 1411.2 13.8
1450 48 1346.4 103.6
1300 46 1324.8 -24.8
1400 62 1497.6 -97.6
1510 61 1486.8 23.2
1575 64 1519.2 55.8
1650 71 1594.8 55.2
![Page 64: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/64.jpg)
Estimation of the variance of the error terms, 2
• The variance 2 of the error terms i in the regression model needs to be estimated for a variety of purposes.
• It gives an indication of the variability of the probability distributions of y.
• It is needed for making inference concerning regression function and the prediction of y.
![Page 65: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/65.jpg)
Regression Standard Error
• To estimate we work with the variance and take the square root to obtain the standard deviation.
• For simple linear regression the estimate of 2 is the average squared residual.
• To estimate , use
• s estimates the standard deviation of the error term in the statistical model for simple linear regression.
222
. )ˆ(2
1
2
1iiixy yy
ne
ns
2
.. xyxy ss
![Page 66: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/66.jpg)
Regression Standard Error
y x y-hat Residual (e) square(e)
1250 41 1270.8 -20.8 432.64
1380 54 1411.2 -31.2 973.44
1425 63 1508.4 -83.4 6955.56
1425 54 1411.2 13.8 190.44
1450 48 1346.4 103.6 10732.96
1300 46 1324.8 -24.8 615.04
1400 62 1497.6 -97.6 9525.76
1510 61 1486.8 23.2 538.24
1575 64 1519.2 55.8 3113.64
1650 71 1594.8 55.2 3047.04
y-hat = 828+10.8X total 36124.76
Sy .x 67.19818
![Page 67: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/67.jpg)
Residual plots
• The points in this residual plot have a curve pattern, so a straight line fits poorly
![Page 68: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/68.jpg)
Residual plots
• The points in this plot show more spread for larger values of the explanatory variable x, so prediction will be less accurate when x is large.
![Page 69: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/69.jpg)
Variable transformations
• If the residual plot suggests that the variance is not constant, a transformation can be used to stabilize the variance.
• If the residual plot suggests a non linear relationship between x and y, a transformation may reduce it to one that is approximately linear.
• Common linearizing transformations are:
• Variance stabilizing transformations are:
)log(,1
xx
2,),log(,1
yyyy
![Page 70: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/70.jpg)
2 predictors: age and vit D…
![Page 71: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/71.jpg)
Different 3D view…
![Page 72: Applied Business Forecasting and Planningpinar/courses/VBM687/lectures/Regression.pdf · EPI 809/Spring 2008 31 Least Squares •1. Best Fit Means Difference Between Actual Y Values](https://reader033.vdocuments.us/reader033/viewer/2022041915/5e693d57ab5a1510c11559e5/html5/thumbnails/72.jpg)
Fit a plane rather than a line…
On the plane, the slope for vitamin D is the same at every age; thus, the slope for vitamin D represents the effect of vitamin D when age is held constant.