statistics for business and economics chapter 10 simple linear regression
Post on 20-Dec-2015
226 views
TRANSCRIPT
Statistics for Business and Economics
Chapter 10
Simple Linear Regression
Learning Objectives
1. Describe the Linear Regression Model
2. State the Regression Modeling Steps
3. Explain Least Squares
4. Compute Regression Coefficients
5. Explain Correlation
6. Predict Response Variable
Models
Models
• Representation of some phenomenon
• Mathematical model is a mathematical expression of some phenomenon
• Often describe relationships between variables
• Types– Deterministic models– Probabilistic models
Deterministic Models
• Hypothesize exact relationships
• Suitable when prediction error is negligible
• Example: force is exactly mass times acceleration
– F = m·a
© 1984-1994 T/Maker Co.
Probabilistic Models
• Hypothesize two components– Deterministic– Random error
• Example: sales volume (y) is 10 times advertising spending (x) + random error
– y = 10x + – Random error may be due to factors
other than advertising
Types of Probabilistic Models
ProbabilisticModels
Regression Models
Correlation Models
Regression Models
Types of Probabilistic Models
ProbabilisticModels
Regression Models
Correlation Models
Regression Models
• Answers ‘What is the relationship between the variables?’
• Equation used– One numerical dependent (response) variable
What is to be predicted– One or more numerical or categorical
independent (explanatory) variables
• Used mainly for prediction and estimation
Regression Modeling Steps
1. Hypothesize deterministic component
2. Estimate unknown model parameters
3. Specify probability distribution of random error term
• Estimate standard deviation of error
4. Evaluate model
5. Use model for prediction and estimation
Model Specification
Regression Modeling Steps
1. Hypothesize deterministic component
2. Estimate unknown model parameters
3. Specify probability distribution of random error term
• Estimate standard deviation of error
4. Evaluate model
5. Use model for prediction and estimation
Specifying the Model
1. Define variables• Conceptual (e.g., Advertising, price)• Empirical (e.g., List price, regular price) • Measurement (e.g., $, Units)
2. Hypothesize nature of relationship• Expected effects (i.e., Coefficients’ signs)• Functional form (linear or non-linear)• Interactions
Model Specification Is Based on Theory
• Theory of field (e.g., Sociology)
• Mathematical theory
• Previous research
• ‘Common sense’
Advertising
Sales
Advertising
Sales
Advertising
Sales
Advertising
Sales
Thinking Challenge: Which Is More Logical?
Simple
1 ExplanatoryVariable
Types of Regression Models
RegressionModels
2+ ExplanatoryVariables
Multiple
LinearNon-
LinearLinear
Non-Linear
Linear Regression Model
Types of Regression Models
Simple
1 ExplanatoryVariable
RegressionModels
2+ ExplanatoryVariables
Multiple
LinearNon-
LinearLinear
Non-Linear
y x 0 1
Linear Regression Model
Relationship between variables is a linear function
Dependent (Response) Variable
Independent (Explanatory) Variable
Population Slope
Population y-intercept
Random Error
y
E(y) = β0 + β1x (line of means)
β0 = y-intercept
x
Change in y
Change in x
β1 = Slope
Line of Means
Population & Sample Regression Models
Population
$ $
$
$
$
Unknown Relationship
0 1y x
Random Sample
$$ $$
$$ $$
0 1ˆ ˆ ˆy x
y
x
Population Linear Regression Model
0 1i i iy x
0 1E y x
Observedvalue
Observed value
i = Random error
y
x
0 1ˆ ˆ ˆi i iy x
Sample Linear Regression Model
0 1ˆ ˆˆi iy x
Unsampled observation
i = Random
error
Observed value
^
Estimating Parameters:Least Squares Method
Regression Modeling Steps
1. Hypothesize deterministic component
2. Estimate unknown model parameters
3. Specify probability distribution of random error term
• Estimate standard deviation of error
4. Evaluate model
5. Use model for prediction and estimation
Scattergram
1. Plot of all (xi, yi) pairs
2. Suggests how well model will fit
0204060
0 20 40 60
x
y
0204060
0 20 40 60
x
y
Thinking Challenge
• How would you draw a line through the points?• How do you determine which line ‘fits best’?
Least Squares
• ‘Best fit’ means difference between actual y values and predicted y values are a minimum
– But positive differences off-set negative
2 2
1 1
ˆ ˆn n
ii ii i
y y
• Least Squares minimizes the Sum of the Squared Differences (SSE)
Least Squares Graphically
2
y
x
1
3
4
^^
^^
2 0 1 2 2ˆ ˆ ˆy x
0 1ˆ ˆˆi iy x
2 2 2 2 21 2 3 4
1
ˆ ˆ ˆ ˆ ˆLS minimizes n
ii
Coefficient Equations
1 1
11 2
12
1
ˆ
n n
i ini i
i ixy i
nxx
ini
ii
x y
x ySS nSS
x
xn
Slope
y-intercept
0 1ˆ ˆy x Prediction Equation
0 1ˆ ˆy x
Computation Table
xi2
x12
x2
2
: :
xn2
Σxi2
yi
y1
y2
:
yn
Σyi
xi
x1
x2
:
xn
Σxi
2
2
2
2
2
yi
y1
y2
:
yn
Σyi
xiyi
Σxiyi
x1y1
x2y2
xnyn
Interpretation of Coefficients
^
^
^1. Slope (1)
• Estimated y changes by 1 for each 1unit increase
in x— If 1 = 2, then Sales (y) is expected to increase by 2
for each 1 unit increase in Advertising (x)
2. Y-Intercept (0)
• Average value of y when x = 0— If 0 = 4, then Average Sales (y) is expected to be
4 when Advertising (x) is 0
^
^
Least Squares Example
You’re a marketing analyst for Hasbro Toys. You gather the following data:
Ad $ Sales (Units)1 12 13 24 25 4
Find the least squares line relatingsales and advertising.
0
1
2
3
4
0 1 2 3 4 5
Scattergram Sales vs. Advertising
Sales
Advertising
Parameter Estimation Solution Table
2 2
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
xi yi xi yi xiyi
Parameter Estimation Solution
1 1
11 2 2
12
1
15 1037
5ˆ .7015
555
n n
i ini i
i ii
n
ini
ii
x y
x yn
x
xn
ˆ .1 .7y x
0 1ˆ ˆ 2 .70 3 .10y x
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Param=0 Prob>|T|
INTERCEP 1 -0.1000 0.6350 -0.157 0.8849
ADVERT 1 0.7000 0.1914 3.656 0.0354
Parameter Estimation Computer Output
0^
1^
ˆ .1 .7y x
Coefficient Interpretation Solution
1. Slope (1)
• Sales Volume (y) is expected to increase by .7 units for each $1 increase in Advertising (x)
^
2. Y-Intercept (0)
• Average value of Sales Volume (y) is -.10 units when Advertising (x) is 0— Difficult to explain to marketing manager
— Expect some sales without advertising
^
0
1
2
3
4
0 1 2 3 4 5
Regression Line Fitted to the Data
Sales
Advertising
ˆ .1 .7y x
Least Squares Thinking Challenge
You’re an economist for the county cooperative. You gather the following data:
Fertilizer (lb.) Yield (lb.) 4 3.0 6 5.510 6.512 9.0
Find the least squares line relatingcrop yield and fertilizer.
© 1984-1994 T/Maker Co.
02468
10
0 5 10 15
Scattergram Crop Yield vs. Fertilizer*
Yield (lb.)
Fertilizer (lb.)
Parameter Estimation Solution Table*
4 3.0 16 9.00 12
6 5.5 36 30.25 33
10 6.5 100 42.25 65
12 9.0 144 81.00 108
32 24.0 296 162.50 218
xi2yi xi yi xiyi
2
Parameter Estimation Solution*
1 1
11 2 2
12
1
32 24218
4ˆ .6532
2964
n n
i ini i
i ii
n
ini
ii
x y
x yn
x
xn
0 1ˆ ˆ 6 .65 8 .80y x
ˆ .8 .65y x
Coefficient Interpretation Solution*
2. Y-Intercept (0)
• Average Crop Yield (y) is expected to be 0.8 lb. when no Fertilizer (x) is used
^
^1. Slope (1)
• Crop Yield (y) is expected to increase by .65 lb. for each 1 lb. increase in Fertilizer (x)
Regression Line Fitted to the Data*
02468
10
0 5 10 15
Yield (lb.)
Fertilizer (lb.)
ˆ .8 .65y x
Probability Distribution of Random Error
Regression Modeling Steps
1. Hypothesize deterministic component
2. Estimate unknown model parameters
3. Specify probability distribution of random error term
• Estimate standard deviation of error
4. Evaluate model
5. Use model for prediction and estimation
Linear Regression Assumptions
1. Mean of probability distribution of error, ε, is 0
2. Probability distribution of error has constant variance
3. Probability distribution of error, ε, is normal
4. Errors are independent
Error Probability Distribution
x1 x2 x3
yE(y) = β0 + β1x
x
Random Error Variation
• Measured by standard error of regression model– Sample standard deviation of : s^
• Affects several factors– Parameter significance– Prediction accuracy
^• Variation of actual y from predicted y, y
y
xxi
Variation Measures
0 1ˆ ˆˆi iy x
yi 2ˆ( )i iy y
Unexplained sum of squares
2( )iy yTotal sum of squares
2ˆ( )iy yExplained sum of squares
y
Estimation of σ2
22 ˆ2 i i
SSEs where SSE y y
n
2
2
SSEs s
n
Calculating SSE, s2, s Example
You’re a marketing analyst for Hasbro Toys. You gather the following data:
Ad $ Sales (Units)1 12 13 24 25 4
Find SSE, s2, and s.
Calculating SSE Solution
1 1
2 1
3 2
4 2
5 4
xi yi
.6
1.3
2
2.7
3.4
ˆy y
.4
-.3
0
-.7
.6
2ˆ( )y yˆ .1 .7y x
.16
.09
0
.49
.36
SSE=1.1
Calculating s2 and s Solution
2 1.1.36667
2 5 2
SSEs
n
.36667 .6055s
Evaluating the Model
Testing for Significance
Regression Modeling Steps
1. Hypothesize deterministic component
2. Estimate unknown model parameters
3. Specify probability distribution of random error term
• Estimate standard deviation of error
4. Evaluate model
5. Use model for prediction and estimation
Test of Slope Coefficient
• Shows if there is a linear relationship between x and y
• Involves population slope 1
• Hypotheses – H0: 1 = 0 (No Linear Relationship)
– Ha: 1 0 (Linear Relationship)
• Theoretical basis is sampling distribution of slope
y
Population Linex
Sample 1 Line
Sample 2 Line
Sampling Distribution of Sample Slopes
1
Sampling Distribution
11
11SS
^
^
All Possible Sample Slopes
Sample 1: 2.5
Sample 2: 1.6
Sample 3: 1.8
Sample 4: 2.1 : :
Very large number of sample slopes
Slope Coefficient Test Statistic
1
1 1
ˆ
2
12
1
ˆ ˆ2
wherexx
n
ini
xx ii
t df nsS
SS
x
SS xn
Test of Slope Coefficient Example
You’re a marketing analyst for Hasbro Toys.
You find β0 = –.1, β1 = .7 and s = .6055.Ad $ Sales (Units)
1 12 13 24 25 4
Is the relationship significant at the .05 level of significance?
^^
Test of Slope Coefficient Solution
• H0:• Ha:• • df • Critical Value(s):
Test Statistic:
Decision:
Conclusion:
t0 3.182-3.182
.025
Reject H0 Reject H0
.025
1 = 0
1 0
.05
5 - 2 = 3
Solution Table
xi yi xi2 yi
2 xiyi
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
Test StatisticSolution
1
1
ˆ 2
1
ˆ
.6055.1914
1555
5
ˆ .703.657
.1914
xx
SS
SS
tS
Test of Slope Coefficient Solution
• H0: 1 = 0
• Ha: 1 0
• .05• df 5 - 2 = 3• Critical Value(s):
Test Statistic:
Decision:
Conclusion:
1
1
ˆ
ˆ .703.657
.1914t
S
Reject at = .05
There is evidence of a relationshipt0 3.182-3.182
.025
Reject H0 Reject H0
.025
Test of Slope CoefficientComputer Output
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Param=0 Prob>|T|
INTERCEP 1 -0.1000 0.6350 -0.157 0.8849
ADVERT 1 0.7000 0.1914 3.656 0.0354
t = 1 / S
P-Value
S11 1
^^^^
Correlation Models
Types of Probabilistic Models
ProbabilisticModels
Regression Models
Correlation Models
Correlation Models
• Answers ‘How strong is the linear relationship between two variables?’
• Coefficient of correlation– Sample correlation coefficient denoted r– Values range from –1 to +1– Measures degree of association– Does not indicate cause–effect relationship
Coefficient of Correlation
xy
xx yy
SSr
SS SS
2
2
2
2
xx
yy
xy
xSS x
n
ySS y
n
x ySS xy
n
where
Coefficient of Correlation Values
–1.0 +1.00–.5 +.5
No Linear Correlation
Perfect Negative
Correlation
Perfect Positive
Correlation
Increasing degree of positive correlation
Increasing degree of negative correlation
Coefficient of Correlation Example
You’re a marketing analyst for Hasbro Toys. Ad $ Sales (Units)
1 12 13 24 25 4
Calculate the coefficient ofcorrelation.
Solution Table
xi yi xi2 yi
2 xiyi
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
Coefficient of Correlation Solution
22
2
22
2
(15)55 10
5
(10)26 6
5
(15)(10)37 7
5
xx
yy
xy
xSS x
n
ySS y
n
x ySS xy
n
7.904
10 6xy
xx yy
SSr
SS SS
Coefficient of Correlation Thinking Challenge
You’re an economist for the county cooperative. You gather the following data:
Fertilizer (lb.) Yield (lb.) 4 3.0 6 5.510 6.512 9.0
Find the coefficient of correlation.© 1984-1994 T/Maker Co.
Solution Table*
4 3.0 16 9.00 12
6 5.5 36 30.25 33
10 6.5 100 42.25 65
12 9.0 144 81.00 108
32 24.0 296 162.50 218
xi2yi xi yi xiyi
2
Coefficient of Correlation Solution*
22
2
22
2
(32)296 40
4
(24)162.5 18.5
4
(32)(24)218 26
4
xx
yy
xy
xSS x
n
ySS y
n
x ySS xy
n
26.956
40 18.5xy
xx yy
SSr
SS SS
Proportion of variation ‘explained’ by relationship between x and y
Coefficient of Determination
2 Explained Variation
Total Variationyy
yy
SS SSEr
SS
0 r2 1
r2 = (coefficient of correlation)2
Coefficient of Determination Example
You’re a marketing analyst for Hasbro Toys.
You know r = .904. Ad $ Sales (Units)
1 12 13 24 25 4
Calculate and interpret thecoefficient of determination.
Coefficient of Determination Solution
r2 = (coefficient of correlation)2
r2 = (.904)2
r2 = .817
Interpretation: About 81.7% of the sample variation in Sales (y) can be explained by using Ad $ (x) to predict Sales (y) in the linear model.
r2 Computer Output
Root MSE 0.60553 R-square 0.8167
Dep Mean 2.00000 Adj R-sq 0.7556
C.V. 30.27650
r2 adjusted for number of explanatory variables & sample size
r2
Using the Model for Prediction & Estimation
Regression Modeling Steps
1. Hypothesize deterministic component
2. Estimate unknown model parameters
3. Specify probability distribution of random error term
• Estimate standard deviation of error
4. Evaluate model
5. Use model for prediction and estimation
Prediction With Regression Models
• Types of predictions– Point estimates– Interval estimates
• What is predicted– Population mean response E(y) for given x
Point on population regression line
– Individual response (yi) for given x
What Is Predicted
Mean y, E(y)
y
^yIndividual
Prediction, y
E(y) = x
^x
xP
^ ^y i =
x
Confidence Interval Estimate for Mean Value of y at x = xp
xx
p
SS
xx
nSty
2
2/
1ˆ
df = n – 2
Factors Affecting Interval Width
1. Level of confidence (1 – )• Width increases as confidence increases
2. Data dispersion (s)• Width increases as variation increases
3. Sample size• Width decreases as sample size increases
4. Distance of xp from meanx• Width increases as distance increases
Why Distance from Mean?
Sample 2 Line
y
xx1 x2
y
Sample 1 LineGreater dispersion than x1
x
Confidence Interval Estimate Example
You’re a marketing analyst for Hasbro Toys.
You find β0 = -.1, β 1 = .7 and s = .6055.
Ad $ Sales (Units)1 12 13 24 25 4
Find a 95% confidence interval forthe mean sales when advertising is $4.
^^
Solution Table
x i y i x i2 y i
2 x iy i
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
Confidence Interval Estimate Solution
2
/ 2
2
1ˆ
ˆ .1 .7 4 2.7
4 312.7 3.182 .6055
5 10
1.645 ( ) 3.755
p
xx
x xy t s
n SS
y
E Y
x to be predicted
Prediction Interval of Individual Value of y at x = xp
Note!
2
/ 2
1ˆ 1
p
xx
x xy t S
n SS
df = n – 2
Why the Extra ‘S’?
Expected(Mean) y
yy we're trying topredict
Prediction, y
xxp
E(y) = x
^^
^y i =
x i
Prediction Interval Example
You’re a marketing analyst for Hasbro Toys.
You find β0 = -.1, β 1 = .7 and s = .6055.
Ad $ Sales (Units)1 12 13 24 25 4
Predict the sales when advertising is $4. Use a 95% prediction interval.
^^
Solution Table
x i y i x i2 y i
2 x iy i
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
Prediction Interval Solution
2
/ 2
2
4
1ˆ 1
ˆ .1 .7 4 2.7
4 312.7 3.182 .6055 1
5 10
.503 4.897
p
xx
x xy t s
n SS
y
y
x to be predicted
Dep Var Pred Std Err Low95% Upp95% Low95% Upp95%
Obs SALES Value Predict Mean Mean Predict Predict
1 1.000 0.600 0.469 -0.892 2.092 -1.837 3.037
2 1.000 1.300 0.332 0.244 2.355 -0.897 3.497
3 2.000 2.000 0.271 1.138 2.861 -0.111 4.111
4 2.000 2.700 0.332 1.644 3.755 0.502 4.897
5 4.000 3.400 0.469 1.907 4.892 0.962 5.837
Interval Estimate Computer Output
Predicted y when x = 4
Confidence Interval
SYPredictionInterval
Confidence Intervals v. Prediction Intervals
x
y
x
^^ ^
y i =
x i
Conclusion
1. Described the Linear Regression Model
2. Stated the Regression Modeling Steps
3. Explained Least Squares
4. Computed Regression Coefficients
5. Explained Correlation
6. Predicted Response Variable