sociology 301 bivariate regression ii: testing slope and ...€¦ · 03/05/2016 · for the...
TRANSCRIPT
Sociology 301
Bivariate Regression II:
Testing Slope and Coefficient of Determination
LiyingLuo
05.03
Bivariate Regression
Fi.edregressionmodelforsample
Learningobjec;ves
Understandthebasicideaofes;ma;ngregressioncoefficient
Beabletointerpretandtestregressioncoefficient
Beabletocomputeandinterpretcoefficientofdetermina;on
intercept slope
Calculating Expected Values
Sociology 3811 ~ 3/31/2015 7
Estimating a Regression Equation
For the baseball example, the prediction equation is:
ii0.11X4.76Y +−=
Runs Scored
Gam
es W
on
Interpreting the Regression Equation
How do we interpret this regression equation?
It says that for every one unit increase in X (runs) we should
observe a 0.11 unit increase in Y (wins)
It also literally says that if a team were to score zero runs in a
season … such that X=0 … we should observe that the team would
win -4.76 game (more on this later)
Using the regression equation we can…
…estimate the average value of Y for a given value of X
…predict an individual’s value of Y for a given value of X
ii0.11X4.76Y +−=
Interpreting the Regression Equation
The equation means (in English) that
How many wins would we predict a team to win if they
scored 801 runs?
What is the average number of wins among teams that
score 750 runs?
0.11Runs-4.76Winsof Number Expected +=ii
0.11X4.76Y +−=
83.350.11(801)-4.76Winsof Number Expected =+=
77.740.11(750)-4.76Winsof Number Expected =+=
Sociology 3811 ~ 3/31/2015 7
Estimating a Regression Equation
For the baseball example, the prediction equation is:
ii0.11X4.76Y +−=
Runs Scored
Gam
es W
on
Interpreting the Regression Equation
How do we interpret this regression equation?
It says that for every one unit increase in X (runs) we should
observe a 0.11 unit increase in Y (wins)
It also literally says that if a team were to score zero runs in a
season … such that X=0 … we should observe that the team would
win -4.76 game (more on this later)
Using the regression equation we can…
…estimate the average value of Y for a given value of X
…predict an individual’s value of Y for a given value of X
ii0.11X4.76Y +−=
Interpreting the Regression Equation
The equation means (in English) that
How many wins would we predict a team to win if they
scored 801 runs?
What is the average number of wins among teams that
score 750 runs?
0.11Runs-4.76Winsof Number Expected +=ii
0.11X4.76Y +−=
83.350.11(801)-4.76Winsof Number Expected =+=
77.740.11(750)-4.76Winsof Number Expected =+=
Sociology 3811 ~ 3/31/2015 7
Estimating a Regression Equation
For the baseball example, the prediction equation is:
ii0.11X4.76Y +−=
Runs Scored
Gam
es W
on
Interpreting the Regression Equation
How do we interpret this regression equation?
It says that for every one unit increase in X (runs) we should
observe a 0.11 unit increase in Y (wins)
It also literally says that if a team were to score zero runs in a
season … such that X=0 … we should observe that the team would
win -4.76 game (more on this later)
Using the regression equation we can…
…estimate the average value of Y for a given value of X
…predict an individual’s value of Y for a given value of X
ii0.11X4.76Y +−=
Interpreting the Regression Equation
The equation means (in English) that
How many wins would we predict a team to win if they
scored 801 runs?
What is the average number of wins among teams that
score 750 runs?
0.11Runs-4.76Winsof Number Expected +=ii
0.11X4.76Y +−=
83.350.11(801)-4.76Winsof Number Expected =+=
77.740.11(750)-4.76Winsof Number Expected =+=
Calculating Expected Values
Sociology 3811 ~ 3/31/2015 7
Estimating a Regression Equation
For the baseball example, the prediction equation is:
ii0.11X4.76Y +−=
Runs Scored
Gam
es W
on
Interpreting the Regression Equation
How do we interpret this regression equation?
It says that for every one unit increase in X (runs) we should
observe a 0.11 unit increase in Y (wins)
It also literally says that if a team were to score zero runs in a
season … such that X=0 … we should observe that the team would
win -4.76 game (more on this later)
Using the regression equation we can…
…estimate the average value of Y for a given value of X
…predict an individual’s value of Y for a given value of X
ii0.11X4.76Y +−=
Interpreting the Regression Equation
The equation means (in English) that
How many wins would we predict a team to win if they
scored 801 runs?
What is the average number of wins among teams that
score 750 runs?
0.11Runs-4.76Winsof Number Expected +=ii
0.11X4.76Y +−=
83.350.11(801)-4.76Winsof Number Expected =+=
77.740.11(750)-4.76Winsof Number Expected =+=
Sociology 3811 ~ 3/31/2015 7
Estimating a Regression Equation
For the baseball example, the prediction equation is:
ii0.11X4.76Y +−=
Runs Scored
Gam
es W
on
Interpreting the Regression Equation
How do we interpret this regression equation?
It says that for every one unit increase in X (runs) we should
observe a 0.11 unit increase in Y (wins)
It also literally says that if a team were to score zero runs in a
season … such that X=0 … we should observe that the team would
win -4.76 game (more on this later)
Using the regression equation we can…
…estimate the average value of Y for a given value of X
…predict an individual’s value of Y for a given value of X
ii0.11X4.76Y +−=
Interpreting the Regression Equation
The equation means (in English) that
How many wins would we predict a team to win if they
scored 801 runs?
What is the average number of wins among teams that
score 750 runs?
0.11Runs-4.76Winsof Number Expected +=ii
0.11X4.76Y +−=
83.350.11(801)-4.76Winsof Number Expected =+=
77.740.11(750)-4.76Winsof Number Expected =+=
Sociology 3811 ~ 3/31/2015 7
Estimating a Regression Equation
For the baseball example, the prediction equation is:
ii0.11X4.76Y +−=
Runs Scored
Gam
es W
on
Interpreting the Regression Equation
How do we interpret this regression equation?
It says that for every one unit increase in X (runs) we should
observe a 0.11 unit increase in Y (wins)
It also literally says that if a team were to score zero runs in a
season … such that X=0 … we should observe that the team would
win -4.76 game (more on this later)
Using the regression equation we can…
…estimate the average value of Y for a given value of X
…predict an individual’s value of Y for a given value of X
ii0.11X4.76Y +−=
Interpreting the Regression Equation
The equation means (in English) that
How many wins would we predict a team to win if they
scored 801 runs?
What is the average number of wins among teams that
score 750 runs?
0.11Runs-4.76Winsof Number Expected +=ii
0.11X4.76Y +−=
83.350.11(801)-4.76Winsof Number Expected =+=
77.740.11(750)-4.76Winsof Number Expected =+=
Sociology 3811 ~ 3/31/2015 7
Estimating a Regression Equation
For the baseball example, the prediction equation is:
ii0.11X4.76Y +−=
Runs Scored
Gam
es W
on
Interpreting the Regression Equation
How do we interpret this regression equation?
It says that for every one unit increase in X (runs) we should
observe a 0.11 unit increase in Y (wins)
It also literally says that if a team were to score zero runs in a
season … such that X=0 … we should observe that the team would
win -4.76 game (more on this later)
Using the regression equation we can…
…estimate the average value of Y for a given value of X
…predict an individual’s value of Y for a given value of X
ii0.11X4.76Y +−=
Interpreting the Regression Equation
The equation means (in English) that
How many wins would we predict a team to win if they
scored 801 runs?
What is the average number of wins among teams that
score 750 runs?
0.11Runs-4.76Winsof Number Expected +=ii
0.11X4.76Y +−=
83.350.11(801)-4.76Winsof Number Expected =+=
77.740.11(750)-4.76Winsof Number Expected =+=
Sociology 3811 ~ 3/31/2015 7
Estimating a Regression Equation
For the baseball example, the prediction equation is:
ii0.11X4.76Y +−=
Runs Scored
Gam
es W
on
Interpreting the Regression Equation
How do we interpret this regression equation?
It says that for every one unit increase in X (runs) we should
observe a 0.11 unit increase in Y (wins)
It also literally says that if a team were to score zero runs in a
season … such that X=0 … we should observe that the team would
win -4.76 game (more on this later)
Using the regression equation we can…
…estimate the average value of Y for a given value of X
…predict an individual’s value of Y for a given value of X
ii0.11X4.76Y +−=
Interpreting the Regression Equation
The equation means (in English) that
How many wins would we predict a team to win if they
scored 801 runs?
What is the average number of wins among teams that
score 750 runs?
0.11Runs-4.76Winsof Number Expected +=ii
0.11X4.76Y +−=
83.350.11(801)-4.76Winsof Number Expected =+=
77.740.11(750)-4.76Winsof Number Expected =+=
Sociology 3811 ~ 3/31/2015 7
Estimating a Regression Equation
For the baseball example, the prediction equation is:
ii0.11X4.76Y +−=
Runs Scored
Gam
es W
on
Interpreting the Regression Equation
How do we interpret this regression equation?
It says that for every one unit increase in X (runs) we should
observe a 0.11 unit increase in Y (wins)
It also literally says that if a team were to score zero runs in a
season … such that X=0 … we should observe that the team would
win -4.76 game (more on this later)
Using the regression equation we can…
…estimate the average value of Y for a given value of X
…predict an individual’s value of Y for a given value of X
ii0.11X4.76Y +−=
Interpreting the Regression Equation
The equation means (in English) that
How many wins would we predict a team to win if they
scored 801 runs?
What is the average number of wins among teams that
score 750 runs?
0.11Runs-4.76Winsof Number Expected +=ii
0.11X4.76Y +−=
83.350.11(801)-4.76Winsof Number Expected =+=
77.740.11(750)-4.76Winsof Number Expected =+=
Sociology 3811 ~ 3/31/2015 7
Estimating a Regression Equation
For the baseball example, the prediction equation is:
ii0.11X4.76Y +−=
Runs Scored
Gam
es W
on
Interpreting the Regression Equation
How do we interpret this regression equation?
It says that for every one unit increase in X (runs) we should
observe a 0.11 unit increase in Y (wins)
It also literally says that if a team were to score zero runs in a
season … such that X=0 … we should observe that the team would
win -4.76 game (more on this later)
Using the regression equation we can…
…estimate the average value of Y for a given value of X
…predict an individual’s value of Y for a given value of X
ii0.11X4.76Y +−=
Interpreting the Regression Equation
The equation means (in English) that
How many wins would we predict a team to win if they
scored 801 runs?
What is the average number of wins among teams that
score 750 runs?
0.11Runs-4.76Winsof Number Expected +=ii
0.11X4.76Y +−=
83.350.11(801)-4.76Winsof Number Expected =+=
77.740.11(750)-4.76Winsof Number Expected =+=
Worksheet 04.28
Afinancialanalystwouldliketoknowtherela;onshipbetweenthemutualfundfees(X,in%)anditsannualyield(Y,in%).Sheregressedannualyieldonfeesandfoundinhersamplea=4andb=-0.1.
1.Writedownthefi.edregressionmodel.
2.Interpretthees;matedinterceptandslope.
3.Whatistheexpectedreturnforamutualfundcharging1%fee?5%fee?
Testing Bivariate Regression Coefficient
Fi.edregressionmodelforsample
Tes;ngbivariateregressioncoefficient:
a…intercept:Wedon’tusuallycare…
b…slope:Isthepredictor(X)relatedtotheresponse(Y)?
intercept slope
Testing Bivariate Regression Coefficient
Typicalnullhypothesisaboutpopula;onslopeβ:thepredictorvariableXhasnolinearrela;onshipwiththeresponsevariableY.
Giventhees;matedslopebwithsampledata,howlikelythepopula;onslopeβequalsto0?
Mostcommon:two-tailedtest:H0:β=0vsH1:β≠0
(Far)lesscommon:one-tailedtest:H0:β≥0vsH1:β<0or
H0:β≤0vsH1:β>0
Testing Bivariate Regression Coefficient
Example1:
Aneconomistisinterestedintherela;onshipbetweenannualsalary(Y)andheightininches(X).Heregressedannualsalaryonheightandfoundthees;matedintercepta=30,000andslopeb=350.
Statethenullandresearchhypothesisabouttheslope.
Testing Bivariate Regression Coefficient
Example2:
Acriminologistisinterestedintherela;onshipbetweennumberofhomicide(Y)andmedianhouseholdincome(X)inneighborhoods.Sheregressedthenumberofhomicideonmedianhouseholdincome.Shefoundthees;matedintercepta=0.1andslopeb=-0.5.
Statethenullandresearchhypothesisabouttheslope.
Testing Bivariate Regression Coefficient
TheCentralLimitTheorem
b…asamplesta;s;cs…isnormallydistributedwithmean=βandstandarderror=(undersomecircumstancesofcourse).
ItmeansthatwecanuseZ-sta;s;cstotestwhetherβisdifferentfromthehypothesizedvalue(usually0).
However,weusuallydon’tknowandhavetorelyonsampleinforma;on,soweuset-sta;s;csinstead.
Forthisclass,(standarderrorofb)willbeprovided.
Testing Bivariate Regression Coefficient
Example1
Acivilengineerisinterestedinwhetheranxietylevel(X,anumericmeasureranginganywherefrom0to20with20beingextremelyanxious)isassociatedwithdrivingspeedonthehighway(Y,inmilesperhour).Heregressedspeedonanxietylevelonspeedandfoundthees;matedintercepta=75,thees;matedslopeb=2.1,andthees;matedstandarderrorfortheslopeSb=1.2.
1.Writedownthefi.edregressionmodel.
2.Interpretthees;matedinterceptandslope.3.Statethenullandresearchhypothesisabouttheslope.
4.Decidethealphalevelandcri;calvalue(s).
5.Computetheteststa;s;c.
6.Comparetheteststa;s;ctothecri;calvaluetomakeadecision.
7.Stateatechnicaldecisionandasubstan;veconclusion.
Testing Bivariate Regression Coefficient
Example2
Asociologistisinterestedininter-genera;onalmobilitydefinedastherela;onshipbetweenfathereduca;onandchild’seduca;on.Hesampledn=1,485individuals,andbelowistheSPSSoutput.
1.Writedownthefi.edregressionmodel.
2.Interpretthees;matedinterceptandslope.
3.Statethenullandresearchhypothesisabouttheslope.
4.Decidethealphalevelandcri;calvalue(s).
5.Computetheteststa;s;c.
6.Comparetheteststa;s;ctothecri;calvaluetomakeadecision.
7.Stateatechnicaldecisionandasubstan;veconclusion.
Worksheet 05.03
Acivilengineerisinterestedinwhetheranxietylevel(X,anumericmeasureranginganywherefrom0to20with20beingextremelyanxious)isassociatedwithdrivingspeedonthehighway(Y,inmilesperhour).Heregressedspeedonanxietylevelandfoundthees;matedintercepta=75,andtheinterceptb=2.1,andthees;matedstandarderrorfortheslopeSb=1.2.
1.Writedownthefi.edregressionmodel.2.Interpretthees;matedinterceptandslope.
3.Statethenullandresearchhypothesisabouttheslope.4.Decidethealphalevelandcri;calvalue(s).
5.Computetheteststa;s;c.6.Comparetheteststa;s;ctothecri;calvaluetomakeadecision.
7.Stateatechnicaldecisionandasubstan;veconclusion.
Coefficient of Determination
HowwelldoesXdoines;ma;ng/predic;ngY?
Astrongassocia;on/rela;onshipallowsgoodes;ma;on/predic;on.
Aweakassocia;on/rela;onshipmeansbades;ma;on/predic;on.
Tomeasurethis,weexaminehowmuchofthevaria;oninYcanbea.ributedtoXandhowmuchisrandomerror.
Thatis,wecanpar;;onthevarianceinYintotheparta.ributabletoXandtheparta.ributabletoerror.
Coefficient of Determination
Par;;onthevariance/devia;oninYintotheparta.ributabletoXandtheparta.ributabletoerror.
Sociology 3811 ~ 3/31/2015 8
How Well Does X Predict Y?
How well does a particular regression equation do in
predicting values of the response variable Y?
How strong the association is between X and Y
If the association is extremely strong, then knowing X allows you
to almost perfectly predict Y
If the association is weak, then knowing X does nothing to allow
you to predict the value of Y
As with ANOVA, we can ask how much of the variation in Y
can be attributed to X and how much is random error
That is, we can partition the variance in Y into the part attributable
to X and the part attributable to error
How Well Does X Predict Y?
Start with the deviation:
Then add and subtract the predicted value from the right-hand side and rearrange the equation:
Deviations from the mean can be expressed as the sum of
(1) deviations of the predicted value from the mean and (2) individual deviations from the predicted value
YYYYii−=−
)YY(YYYYiiii
−+−=−
( ) ( )iiii
YYYYYY −+−=−
How Well Does X Predict Y?
Deviations from the mean can be expressed as the sum of (1) deviations of the predicted value from the mean and (2) individual deviations from the predicted value
( ) ( )iiii
YYYYYY −+−=−
Portion of the deviation from the mean thatis attributable
to X
Portion of the deviation from the mean thatis attributable
to random error
Coefficient of Determination
Ifwesquareeachsideandthensumacrosscasesweget:
WhenXisnotrelatedtoY,then…
…knowingXdoesnothelpes;mateorpredictY
…ourbestguessaboutis,so=0and
Sociology 3811 ~ 3/31/2015 9
How Well Does X Predict Y?
If we square each side and then sum across cases we get:
SSTOTAL = SSREGRESSION + SSERROR
If there is no association between Y and X, then knowing X does not help predict Y
In this case, our best guess about Y-hat is Y-bar; thus SSREGRESSION equals zero and SSTOTAL=SSERROR
( ) ( ) ( )2n
1iii
2n
1ii
n
1i
2
i YYYYYY ∑ −+∑ −=∑ −===
TotalSum of Squares
RegressionSum of Squares
ErrorSum of Squares
How Well Does X Predict Y?
The coefficient of determination (R2YX) indicates the
proportion of the total variation in Y that is determined by its linear relationship with X
If R2YX=0, then SSREGRESSION=0, which suggests that there is no
association between Y and X
If R2YX=1, then SSREGRESSION=SSTOTAL, which suggests that there
is no error variation and that we can perfectly predict Y based on X
TOTAL
REGRESSION
TOTAL
ERRORTOTAL2YX SS
SSSS
SSSSR =−
=
How Well Does X Predict Y?R2
YX = 1.0 R2YX = 1.0
R2YX = 0.25 R2
YX = 0.25
Sociology 3811 ~ 3/31/2015 9
How Well Does X Predict Y?
If we square each side and then sum across cases we get:
SSTOTAL = SSREGRESSION + SSERROR
If there is no association between Y and X, then knowing X does not help predict Y
In this case, our best guess about Y-hat is Y-bar; thus SSREGRESSION equals zero and SSTOTAL=SSERROR
( ) ( ) ( )2n
1iii
2n
1ii
n
1i
2
i YYYYYY ∑ −+∑ −=∑ −===
TotalSum of Squares
RegressionSum of Squares
ErrorSum of Squares
How Well Does X Predict Y?
The coefficient of determination (R2YX) indicates the
proportion of the total variation in Y that is determined by its linear relationship with X
If R2YX=0, then SSREGRESSION=0, which suggests that there is no
association between Y and X
If R2YX=1, then SSREGRESSION=SSTOTAL, which suggests that there
is no error variation and that we can perfectly predict Y based on X
TOTAL
REGRESSION
TOTAL
ERRORTOTAL2YX SS
SSSS
SSSSR =−
=
How Well Does X Predict Y?R2
YX = 1.0 R2YX = 1.0
R2YX = 0.25 R2
YX = 0.25
Sociology 3811 ~ 3/31/2015 9
How Well Does X Predict Y?
If we square each side and then sum across cases we get:
SSTOTAL = SSREGRESSION + SSERROR
If there is no association between Y and X, then knowing X does not help predict Y
In this case, our best guess about Y-hat is Y-bar; thus SSREGRESSION equals zero and SSTOTAL=SSERROR
( ) ( ) ( )2n
1iii
2n
1ii
n
1i
2
i YYYYYY ∑ −+∑ −=∑ −===
TotalSum of Squares
RegressionSum of Squares
ErrorSum of Squares
How Well Does X Predict Y?
The coefficient of determination (R2YX) indicates the
proportion of the total variation in Y that is determined by its linear relationship with X
If R2YX=0, then SSREGRESSION=0, which suggests that there is no
association between Y and X
If R2YX=1, then SSREGRESSION=SSTOTAL, which suggests that there
is no error variation and that we can perfectly predict Y based on X
TOTAL
REGRESSION
TOTAL
ERRORTOTAL2YX SS
SSSS
SSSSR =−
=
How Well Does X Predict Y?R2
YX = 1.0 R2YX = 1.0
R2YX = 0.25 R2
YX = 0.25
Coefficient of Determination
Sociology 3811 ~ 3/31/2015 9
How Well Does X Predict Y?
If we square each side and then sum across cases we get:
SSTOTAL = SSREGRESSION + SSERROR
If there is no association between Y and X, then knowing X does not help predict Y
In this case, our best guess about Y-hat is Y-bar; thus SSREGRESSION equals zero and SSTOTAL=SSERROR
( ) ( ) ( )2n
1iii
2n
1ii
n
1i
2
i YYYYYY ∑ −+∑ −=∑ −===
TotalSum of Squares
RegressionSum of Squares
ErrorSum of Squares
How Well Does X Predict Y?
The coefficient of determination (R2YX) indicates the
proportion of the total variation in Y that is determined by its linear relationship with X
If R2YX=0, then SSREGRESSION=0, which suggests that there is no
association between Y and X
If R2YX=1, then SSREGRESSION=SSTOTAL, which suggests that there
is no error variation and that we can perfectly predict Y based on X
TOTAL
REGRESSION
TOTAL
ERRORTOTAL2YX SS
SSSS
SSSSR =−
=
How Well Does X Predict Y?R2
YX = 1.0 R2YX = 1.0
R2YX = 0.25 R2
YX = 0.25
Sociology 3811 ~ 3/31/2015 9
How Well Does X Predict Y?
If we square each side and then sum across cases we get:
SSTOTAL = SSREGRESSION + SSERROR
If there is no association between Y and X, then knowing X does not help predict Y
In this case, our best guess about Y-hat is Y-bar; thus SSREGRESSION equals zero and SSTOTAL=SSERROR
( ) ( ) ( )2n
1iii
2n
1ii
n
1i
2
i YYYYYY ∑ −+∑ −=∑ −===
TotalSum of Squares
RegressionSum of Squares
ErrorSum of Squares
How Well Does X Predict Y?
The coefficient of determination (R2YX) indicates the
proportion of the total variation in Y that is determined by its linear relationship with X
If R2YX=0, then SSREGRESSION=0, which suggests that there is no
association between Y and X
If R2YX=1, then SSREGRESSION=SSTOTAL, which suggests that there
is no error variation and that we can perfectly predict Y based on X
TOTAL
REGRESSION
TOTAL
ERRORTOTAL2YX SS
SSSS
SSSSR =−
=
How Well Does X Predict Y?R2
YX = 1.0 R2YX = 1.0
R2YX = 0.25 R2
YX = 0.25
Coefficient of Determination
Sociology 3811 ~ 3/31/2015 9
How Well Does X Predict Y?
If we square each side and then sum across cases we get:
SSTOTAL = SSREGRESSION + SSERROR
If there is no association between Y and X, then knowing X does not help predict Y
In this case, our best guess about Y-hat is Y-bar; thus SSREGRESSION equals zero and SSTOTAL=SSERROR
( ) ( ) ( )2n
1iii
2n
1ii
n
1i
2
i YYYYYY ∑ −+∑ −=∑ −===
TotalSum of Squares
RegressionSum of Squares
ErrorSum of Squares
How Well Does X Predict Y?
The coefficient of determination (R2YX) indicates the
proportion of the total variation in Y that is determined by its linear relationship with X
If R2YX=0, then SSREGRESSION=0, which suggests that there is no
association between Y and X
If R2YX=1, then SSREGRESSION=SSTOTAL, which suggests that there
is no error variation and that we can perfectly predict Y based on X
TOTAL
REGRESSION
TOTAL
ERRORTOTAL2YX SS
SSSS
SSSSR =−
=
How Well Does X Predict Y?R2
YX = 1.0 R2YX = 1.0
R2YX = 0.25 R2
YX = 0.25
Coefficient of Determination
Sociology 3811 ~ 3/31/2015 10
How Well Does X Predict Y?
For the baseball example:
Runs Scored
Gam
es W
on
0.3832948.9671128.163
SSSS
RTOTAL
REGRESSION2YX ===
How Well Does X Predict Y?
An R2YX of 0.383 means that 38.3% of the variation in Y
(Wins) is “explained” by X (Runs Scored)
Runs Scored
Gam
es W
on
Caution I: Outliers?
ii
YX2
0.109X4.764Y
0.383R
+−=
=
Runs Scored
Gam
es W
on
Runs Scored
Gam
es W
on
One Outlier
ii
YX2
0.078X18.703Y
0.227R
+=
=
Coefficient of Determination
Sociology 3811 ~ 3/31/2015 10
How Well Does X Predict Y?
For the baseball example:
Runs Scored
Gam
es W
on
0.3832948.9671128.163
SSSS
RTOTAL
REGRESSION2YX ===
How Well Does X Predict Y?
An R2YX of 0.383 means that 38.3% of the variation in Y
(Wins) is “explained” by X (Runs Scored)
Runs Scored
Gam
es W
on
Caution I: Outliers?
ii
YX2
0.109X4.764Y
0.383R
+−=
=
Runs Scored
Gam
es W
on
Runs Scored
Gam
es W
on
One Outlier
ii
YX2
0.078X18.703Y
0.227R
+=
=
Coefficient of Determination
Example1:Calculateandinterpretforregressingchurcha.endanceonage:
SSREGRESSION=67,123
SSERROR=2,861,928
SSTOTAL=?
Example1:Calculateandinterpretforregressingsexfreqonage:
SSREGRESSION=1,511,622
SSERROR=?
SSTOTAL=10,502,532
Sociology 3811 ~ 3/31/2015 10
How Well Does X Predict Y?
For the baseball example:
Runs Scored
Gam
es W
on
0.3832948.9671128.163
SSSS
RTOTAL
REGRESSION2YX ===
How Well Does X Predict Y?
An R2YX of 0.383 means that 38.3% of the variation in Y
(Wins) is “explained” by X (Runs Scored)
Runs Scored
Gam
es W
on
Caution I: Outliers?
ii
YX2
0.109X4.764Y
0.383R
+−=
=
Runs Scored
Gam
es W
on
Runs Scored
Gam
es W
on
One Outlier
ii
YX2
0.078X18.703Y
0.227R
+=
=
Sociology 3811 ~ 3/31/2015 9
How Well Does X Predict Y?
If we square each side and then sum across cases we get:
SSTOTAL = SSREGRESSION + SSERROR
If there is no association between Y and X, then knowing X does not help predict Y
In this case, our best guess about Y-hat is Y-bar; thus SSREGRESSION equals zero and SSTOTAL=SSERROR
( ) ( ) ( )2n
1iii
2n
1ii
n
1i
2
i YYYYYY ∑ −+∑ −=∑ −===
TotalSum of Squares
RegressionSum of Squares
ErrorSum of Squares
How Well Does X Predict Y?
The coefficient of determination (R2YX) indicates the
proportion of the total variation in Y that is determined by its linear relationship with X
If R2YX=0, then SSREGRESSION=0, which suggests that there is no
association between Y and X
If R2YX=1, then SSREGRESSION=SSTOTAL, which suggests that there
is no error variation and that we can perfectly predict Y based on X
TOTAL
REGRESSION
TOTAL
ERRORTOTAL2YX SS
SSSS
SSSSR =−
=
How Well Does X Predict Y?R2
YX = 1.0 R2YX = 1.0
R2YX = 0.25 R2
YX = 0.25
Sociology 3811 ~ 3/31/2015 10
How Well Does X Predict Y?
For the baseball example:
Runs Scored
Gam
es W
on
0.3832948.9671128.163
SSSS
RTOTAL
REGRESSION2YX ===
How Well Does X Predict Y?
An R2YX of 0.383 means that 38.3% of the variation in Y
(Wins) is “explained” by X (Runs Scored)
Runs Scored
Gam
es W
on
Caution I: Outliers?
ii
YX2
0.109X4.764Y
0.383R
+−=
=
Runs Scored
Gam
es W
on
Runs Scored
Gam
es W
on
One Outlier
ii
YX2
0.078X18.703Y
0.227R
+=
=