quantitative approaches contents lesson 10: bivariate
TRANSCRIPT
Quantitative approaches
Lesson 10:
Bivariate regression
Quantitative approaches
Contents
1. What is (bivariate) linear regression?
2. Example : Size of dwarfs and the influence of food
3. How to do it in SPSS
Quantitative approaches
1. What is (bivariate) linear regression?
Quantitative approaches
What is (bivariate) linear regression?
Bivariate linear regression =
Statistical method that relates an independent variable to a
dependent (or response) variable by modeling the
relationship as a straight line.
Regression analysis is used when both variables are
continuous variables (measured on an interval or metric
scale)
Quantitative approaches
The basic model
The basic model we fit in a bivariate linear regression is a
straight line with
y = a + bx
y = response variable (= dependent)
x = explanatory variable (= independent)
a = intercept
b = slope
Quantitative approaches
What is (bivariate) linear regression?
delta y
delta x
b =delta y
delta x= slope of line
a
a = intercept
Quantitative approaches
2. Example :
Size of dwarfs and influence of food
Quantitative approaches
Size of dwarfs and influence of food : data
Food (X) Size (Y)
8 12
7 10
6 8
5 11
4 6
3 7
2 2
1 3
0 3
Quantitative approaches
Food and size of dwarfs: Scatterplot
Quantitative approaches
Scatterplot and regression line
The regression line is
our «!model!» for the
data.
For every value of
«!food!», the model
predicts the value of
«!size!» on the
regression line.
Quantitative approaches
The meaning of the slope b
Slope b = change in Y
that accompanies a
unit change in X
In our example:
Adding one unit of
food causes a dwarf to
grow 1.22 cm on
average.
Slope b =!Y
!X=
! Size
! Food
Quantitative approaches
Errors (or: residuals)
Since the prediction is rarely
completely accurate, we get for
every value of «!food!»
an «!error!» e , that is, the
distance between actual value
of «!size!» and predicted value
of «!size!» .
We also get an «!explained part
of the variance! r »
errors
Y!
Y
Y!
Y
e = Y !Y!
Quantitative approaches
The Least Squares Criterion
We look for the line
that minimizes the
squared residuals e
(SSE). This is called
the «!least squares
criterion!»
error
Y!
Y
minimize SSE = e
2= (Y !Y!)
2
""
e = Y !Y!
Quantitative approaches
Degree of fit: R-squareIt is not enough to know the value of slope b. Very different relationships
between X and Y may have the same slope b.
We therefore calculate R-square, (= explained variance/total variance) in
order to measure the «!fit!» of the model. R-square ranges from 0 to 1.
b = 1.163
R-squared = 0.979
b = 1.483
R-squared = 0.877
b = 1.521
R-squared = 0.589
Quantitative approaches
Explained Variance
All the variance is
explained through the modelNo variance is
explained through the model
Quantitative approaches
Degree of fit: R-square
By introducing the regression line, we
divide the total variation of «!size!»
into a regression variation SSR
(explained) and a error variation SSE
(unexplained).
Explained variance
= R-square
= explained variation/total variation
R2=SSR
SSY
Quantitative approaches
Formula (1)
SSY = (y ! y)2
"
SSX = (x ! x)2
"
SSXY = (x ! x)(y ! y)"
b =SSXY
SSXa =
y!n
" b*x!
n
slope of regression line intercept of regression line
sums of squares in Y
sums of squares in X
sums of products X,Y
Quantitative approaches
Formula (2)
SSE = SSY ! SSR
regression variation
(explained)
error variation
(unexplained)
total variation
(sum of squares)SSY = (y ! y)
2
"
SSR =SSXY
2
SSX
explained variance R2=SSR
SSY
Quantitative approaches
SSY = (y ! y)2
" = 108.8889
SSX = (x ! x)2
" = 60
SSXY = (x ! x)(y ! y)" = 73
Calculating intercept a and slope b
y = a + b* x
y = 2.02 +1.22 * x
b =SSXY
SSX=73
60= 1.22
a =y!
n" b*
x!n
=62
9"1*
36
9= 2.02
Quantitative approaches
SSY = (y ! y)2
" = 108.8889
SSX = (x ! x)2
" = 60
SSXY = (x ! x)(y ! y)" = 73
Calculating explained variation,
residual variation and explained variance
SSR =SSXY
2
SSX=73
2
60= 88.8166
SSE = SSY ! SSR = 108.8889 ! 88.8166 = 20.0723
Explained variance
SSR
SSY=88.8166
108.8889= 0.8157 = 81.6%
Regression
variation
Error
variation
Quantitative approaches
Calculating error variance (ANOVA-table)
Regression
Error
Total
Sum of squares df Mean squares F ratio
88.817 (SSR)
20.072 (SSE)
108.889 (SSY)
88.817 30.9741
7
8
88.817
1=
20.072
7= s
2= 2.86746
88.817
2.86746=
critical F-value = 5.591
Since the F-ratio is greater than the
critical F-value for df= 1/7, we
reject the 0-hypothesis that the real
b in population could be equal to 0
The ANOVA-table of the
regression tells us if all the
explanatory variables have together
a significant effect on the variance
of Y
the error
variance
will be
used to
calculate
standard
errors for
b and a
Quantitative approaches
Calculating the standard errors of the
intercept a and the slope b
We can now use the error variance s2 from the Anova-table
in order to calculate the standard errors of the intercept a
and the slope b.
standard error of b =s
2
SSX=
2.867
60= 0.2186
standard error of a =s
2x
2
!n*SSX
=2.867 *204
9 *60= 1.0408
Quantitative approaches
Calculating the p-value of intercept a
Coefficients:
Estimate Std. Error t value p value
(Intercept) 2.0222 1.0408 1.943 0.093129
food 1.2167 0.2186 5.565 0.000846 ***
Estimate
Std. Error = t value
2.0222
1.0408 = 1.943
The t-value +/-1.943 cuts off two areas
of the t-distribution with df=8 on the left and the right hand
side. The total of these two areas is the p-value 0.0931.
-> in 9.3% of the cases an intercept might have come up
with this size or bigger, even if the real intercept was 0.
-> The intercept is not significantly bigger than 0.
Quantitative approaches
Calculating the p-value of slope b
Coefficients:
Estimate Std. Error t value p value
(Intercept) 2.0222 1.0408 1.943 0.093129
food 1.2167 0.2186 5.565 0.000846 ***
Estimate
Std. Error = t value
The t-value +/- 5.565 cuts off two areas
of the t-distribution with df=8 on the left and the right hand
side. The total of these two areas is the p-value 0.000846.
-> in 0.08% of the cases an intercept might have come up
with this size or bigger, even if the real intercept was 0.
-> The slope is not significantly different from 0.
1.2167
0.2186 = 5.565
Quantitative approaches
Calculating the p-value of slope b
t=5.565t=-5.565
Quantitative approaches
3. How to do it in SPSS
Quantitative approaches
Regression (1) : get data
File -> Open -> Data
Click on FoodSize.sav
Open
Quantitative approaches
Regression (2)
Analyze -> Regression -> Linear
Put «!food!» into «!Dependent!»
Put «!size!» into «!Indepedent(s)!»
Statistics:
Regression Coefficients:
- Estimates
- Confidence intervals
Continue
OK
Quantitative approaches
Regression (3) : Results