lecture 20: simple linear regression api-201z · announcements i midterms nearly graded i executive...

Lecture 20:Simple Linear Regression

API-201Z

Maya Sen

Harvard Kennedy Schoolhttp://scholar.harvard.edu/msen

Announcements

I Midterms nearly graded

I Executive summaries now due on 11/29 (Thursday, as part ofPS #10)

I We’ll set up online poll for which groups will present on 12/4(due date)

I Regular office hours resume post-TG – happy to chat with youat any point about final exercises!

Announcements

Roadmap

I Introduce concept of Ordinary Least Squares (OLS) methodof estimating linear regression

I Discuss simplest application, Simple Linear RegressionI Relationship between two continuous variables

I Hypothesis tests and CIs for regression parameters

I Sets us up to cover regression with more than one explanatoryvariable, interpretation of tables

Roadmap

Last time

I We have covered several more advanced inference techniquesI ANOVA: Global test comparing means across groupsI Chi Square Test: Test comparing independence of rows and

columns in a frequency tableI But both suffer from weakness →

I If null rejected, then what can we say about strength/directionof association?

I Can we predict anything?

I Linear regression allows us to assess (1) strength and (2)direction in the relationship between two variables

I Useful across many different applications and for prediction

I Along with difference in means, one of the most widely usedstatistical techniques; we’ll cover only basics in this course

Last time

I We have covered several more advanced inference techniques

I ANOVA: Global test comparing means across groupsI Chi Square Test: Test comparing independence of rows and

Last time

I We have covered several more advanced inference techniquesI ANOVA: Global test comparing means across groups

I Chi Square Test: Test comparing independence of rows andcolumns in a frequency table

I But both suffer from weakness →I If null rejected, then what can we say about strength/direction

of association?I Can we predict anything?

Last time

columns in a frequency table

I But both suffer from weakness →I If null rejected, then what can we say about strength/direction

of association?I Can we predict anything?

Last time

State Unemployment Example

I Motivate linear regression with a simple example:

I Suppose our policy area is labor unemployment – thinkunemployment is “sticky” and lags over time

I Is there relationship between state-level unemployment ratesin U.S. in 1995 and in 2000?

I A random sample of 30 states was takenI For each state, data was collected on:

I Unemployment rate in 1995I Unemployment rate in 2000

I A random sample of 30 states was taken

I For each state, data was collected on:I Unemployment rate in 1995I Unemployment rate in 2000

I Unemployment rate in 1995

I Unemployment rate in 2000

State 1995 2000

Alabama 5.3 4.0Alaska 7.1 6.2Arizona 5.4 4.1

Arkansas 4.8 4.1California 8.0 5.0Colorado 4.3 3.0

... ... ...

State 1995 2000

Alabama 5.3 4.0Alaska 7.1 6.2Arizona 5.4 4.1

Arkansas 4.8 4.1California 8.0 5.0Colorado 4.3 3.0

... ... ...

I Obvious two variables correlated → could use correlation toexamine relationship

I Correlation: Measures strength of linear association btn 2 vars

I 2 variables treated in similar manner → variablesinterchangeable (correlation of x with y , or y with x , same)

I Correlation coefficient r takes values between 0 and 1I State unemployment rate example:

I Strong positive correlationI Correlation coefficient r = 0.78

I Correlation coefficient r takes values between 0 and 1

I State unemployment rate example:I Strong positive correlationI Correlation coefficient r = 0.78

I Strong positive correlation

I Correlation coefficient r = 0.78

However: we can put more structure on relationship w/ regression

I Regression: Each variable has specific roleI x is explanatory (or independent or predictor) variable

I Always represented on horizontal (X ) axisI Can be binary, categorical, or continuous (will discuss a bit in

this class)

I y is outcome (or dependent or response) variable, the variablewe are trying to predict

I Always represented on vertical (Y ) axisI Here: Continuous (expanded to include dichotomous,

categorical outcomes next semester)

this class)

I Regression: Each variable has specific role

I x is explanatory (or independent or predictor) variableI Always represented on horizontal (X ) axisI Can be binary, categorical, or continuous (will discuss a bit in

this class)

I Always represented on horizontal (X ) axis

I Can be binary, categorical, or continuous (will discuss a bit inthis class)

this class)

I Always represented on vertical (Y ) axis

I Here: Continuous (expanded to include dichotomous,categorical outcomes next semester)

this class)

Predictor, Explanatory, or Independent Variable

Correlation versus Regression

Regression offers key advantages:

1. Assess whether there is statistically significant relationshipbetween the 2 variables

2. Assess magnitude of that relationship

3. Use explanatory variable to predict predicted values of theoutcome variable

4. Eventually will allow us to take other variables into account

Simple Linear Regression

Let’s explore w/ simplest kind of regression:

I Simple: Only one independent variable (so bivariate)

I Linear: Straight line relationship

I Regression: Method of fitting data to (linear) model

I However: How to find the line that best describes the datasetwe have collected?

I If we had a true linear relationship between 1995unemployment (x) to 2000 unemployment (y), it would beexpressed by:

y = β0 + β1x

I where y is the outcomeI β0 is interceptI β1 is slopeI and x is the explanatory variable

I Much of our interest is in the size and sign of β1, the slope

I Slope captures the linear relationship between x and y

y = β0 + β1x

I where y is the outcome

I β0 is interceptI β1 is slopeI and x is the explanatory variable

y = β0 + β1x

I where y is the outcomeI β0 is intercept

I β1 is slopeI and x is the explanatory variable

y = β0 + β1x

I where y is the outcomeI β0 is interceptI β1 is slope

I and x is the explanatory variable

y = β0 + β1x

Positive Relationship Between X and Y

Slope is Positive

Positive Relationship Between X and Y

Slope is Positive

Negative Relationship Between X and Y

Slope is Negative

No Relationship Between X and Y

Slope is 0

I However: Simple line yi = β0 + β1xi assumes perfectlydeterministic relationship between x and y

I Maybe good for understanding, e.g., relationship of Fahrenheitto Celsius, but not much else!

I More realistic → x and y are related linearly, but there issome noise around that, so that it’s not a single perfect line

I Thus: for a single observation (xi , yi ):

yi = β0︸︷︷︸Intercept

+ β1︸︷︷︸Slope

xi + εi︸︷︷︸Error

I where εi also known as random errors

I This describes the “true” relationship between x and y :

yi = β0 + β1xi + εi

I However: We can never observe β0 and β1 → these arepopulation parameters!

I Best thing we can do is estimate them using our data

I Thus, we have an estimated linear relationship:

yi = b0 + b1xi + ei

I Sometimes also denoted using “hat” notation as

I Residuals (ei ) represent estimates of the random errors, ε1

yi = b0 + b1xi + ei

I Note: Important alternative way of thinking about linearregression is via expected values

I E [yi |xi ] gives the expected (or mean) value of yi for a givenvalue of the independent variable, xi

I Under the linear specification,

E [yi |xi ] = β0 + β1xi

I All predicted values fall exactly on regression line

I Why no error term here? Because E [εi |xi ] = 0

I (You’ll see violations of this in API 202)

E [yi |xi ] = β0 + β1xi

How to find the best estimated line?

Going back to our data:

How to fit the best line?

We’ll take the line that minimizes the sum of squared residuals

I Specifically we will choose the values of β0 and β1 thatminimize:

n∑i=1

(yi − yi )2

I Or:n∑

(yi − β0 − β1xi )2

I Gives Ordinary Least Squares Estimators (see appendix forproof)

I Could calculate other ways to fit a line, but OLS has veryattractive properties

I Under Gauss-Markov Theorem, least squares line is “BLUE”(Best Linear Unbiased Estimator)

I For properties, see Wikipedia (Link)I Video of proof at Khan Academy (Link)

n∑i=1

(yi − yi )2

I Or:n∑

(yi − β0 − β1xi )2

n∑i=1

(yi − yi )2

I Or:n∑

(yi − β0 − β1xi )2

n∑i=1

(yi − yi )2

I Or:n∑

(yi − β0 − β1xi )2

n∑i=1

(yi − yi )2

I Or:n∑

(yi − β0 − β1xi )2

n∑i=1

(yi − yi )2

I Or:n∑

(yi − β0 − β1xi )2

I For properties, see Wikipedia (Link)

I Video of proof at Khan Academy (Link)

n∑i=1

(yi − yi )2

I Or:n∑

(yi − β0 − β1xi )2

OLS Estimates for One Explanatory Variable

I Proof (in Appendix) gives us equation for slope estimate:

∑(xi − x)(yi − y)∑

(xi − x)2

I and equation for the intercept estimate:

b0 = y − b1x

I where x is average of x values (explanatory variable)

I and y is average of y values (outcome variable)

I Note that b1 = r sxsy

∑(xi − x)(yi − y)∑

(xi − x)2

b0 = y − b1x

∑(xi − x)(yi − y)∑

(xi − x)2

b0 = y − b1x

∑(xi − x)(yi − y)∑

(xi − x)2

b0 = y − b1x

∑(xi − x)(yi − y)∑

(xi − x)2

b0 = y − b1x

∑(xi − x)(yi − y)∑

(xi − x)2

b0 = y − b1x

∑(xi − x)(yi − y)∑

(xi − x)2

b0 = y − b1x

∑(xi − x)(yi − y)∑

(xi − x)2

b0 = y − b1x

I Rare to calculate by hand except for simplest cases

I In STATA

. regress yr2000 yr1995

-----------------------------------------------------

yr2000 | Coef. Std. Err. t P>|t|

-----------------------------------------------------

yr1995 | .5398317 .0818083 6.60 0.000

_cons | 1.077917 .4571589 2.36 0.026

----------------------------------------------------

I In R, use lm (linear model) command: lm(yr2000 ∼ yr1995)

I Statistical software will give you:I Intercept coefficient estimate (b0 or β0): 1.077917I Slope coefficient estimate (b1 orβ1): 0.5398317

I In STATA

-----------------------------------------------------

yr1995 | .5398317 .0818083 6.60 0.000

_cons | 1.077917 .4571589 2.36 0.026

----------------------------------------------------

I In STATA

-----------------------------------------------------

yr1995 | .5398317 .0818083 6.60 0.000

_cons | 1.077917 .4571589 2.36 0.026

----------------------------------------------------

I In STATA

-----------------------------------------------------

yr1995 | .5398317 .0818083 6.60 0.000

_cons | 1.077917 .4571589 2.36 0.026

----------------------------------------------------

I In STATA

-----------------------------------------------------

yr1995 | .5398317 .0818083 6.60 0.000

_cons | 1.077917 .4571589 2.36 0.026

----------------------------------------------------

I In R, use lm (linear model) command:

lm(yr2000 ∼ yr1995)

I In STATA

-----------------------------------------------------

yr1995 | .5398317 .0818083 6.60 0.000

_cons | 1.077917 .4571589 2.36 0.026

----------------------------------------------------

I In STATA

-----------------------------------------------------

yr1995 | .5398317 .0818083 6.60 0.000

_cons | 1.077917 .4571589 2.36 0.026

----------------------------------------------------

I Statistical software will give you:

I Intercept coefficient estimate (b0 or β0): 1.077917I Slope coefficient estimate (b1 orβ1): 0.5398317

I In STATA

-----------------------------------------------------

yr1995 | .5398317 .0818083 6.60 0.000

_cons | 1.077917 .4571589 2.36 0.026

----------------------------------------------------

I Statistical software will give you:I Intercept coefficient estimate (b0 or β0): 1.077917

I Slope coefficient estimate (b1 orβ1): 0.5398317

I In STATA

-----------------------------------------------------

yr1995 | .5398317 .0818083 6.60 0.000

_cons | 1.077917 .4571589 2.36 0.026

----------------------------------------------------

I Gives us estimated regression line:

y = 1.08 + 0.54x

I How to interpret?

I One-unit increase in x associated w/ b1 increase/decrease in y

I Here: Based on our data, an increase of 1 percentage point in1995 unemployment rate is associated w/ an increase of 0.54percent point in 2000 unemployment rate

y = 1.08 + 0.54x

I How to interpret?

y = 1.08 + 0.54x

I How to interpret?

y = 1.08 + 0.54x

I How to interpret?

y = 1.08 + 0.54x

I How to interpret?

y = 1.08 + 0.54x

I How to interpret?

OLS Assumptions

OLS relies on several key assumptions

I (1) There is a linear relationship in the population betweenthe independent variable x and the outcome y

I (2) Observations are independent (i.e., one observation oneach state)

I (3) Errors are not correlated with one another

I → You’ll study violations of these assumptions in API 202

OLS Assumptions

Using Regression for Prediction

I We can use information from estimated regression line topredict relationships between x and y

I Ex) Suppose interested in predicting 2000 unemployment ratefor another state not included in the sample

I One state has unemployment rate of 7.5% in 1995 → what ispredicted 2000 rate?

y = 1.08 + 0.54x

= 1.08 + 0.54(7.5) = 5.3

I Another state has unemployment rate of 14% in 1995 → whatis predicted 2000 rate?

y = 1.08 + 0.54x

= 1.08 + 0.54(14.0) = 8.64

I These are called predicted values

Using Regression for PredictionI We can use information from estimated regression line to

predict relationships between x and y

I Ex) Suppose interested in predicting 2000 unemployment ratefor another state not included in the sample

y = 1.08 + 0.54x

= 1.08 + 0.54(7.5) = 5.3

y = 1.08 + 0.54x

= 1.08 + 0.54(14.0) = 8.64

predict relationships between x and yI Ex) Suppose interested in predicting 2000 unemployment rate

for another state not included in the sample

y = 1.08 + 0.54x

= 1.08 + 0.54(7.5) = 5.3

y = 1.08 + 0.54x

= 1.08 + 0.54(14.0) = 8.64

for another state not included in the sampleI One state has unemployment rate of 7.5% in 1995 → what is

predicted 2000 rate?

y = 1.08 + 0.54x

= 1.08 + 0.54(7.5) = 5.3

y = 1.08 + 0.54x

= 1.08 + 0.54(14.0) = 8.64

y = 1.08 + 0.54x

= 1.08 + 0.54(7.5) = 5.3

y = 1.08 + 0.54x

= 1.08 + 0.54(14.0) = 8.64

y = 1.08 + 0.54x

= 1.08 + 0.54(7.5) = 5.3

y = 1.08 + 0.54x

= 1.08 + 0.54(14.0) = 8.64

y = 1.08 + 0.54x

= 1.08 + 0.54(7.5) = 5.3

y = 1.08 + 0.54x

= 1.08 + 0.54(14.0) = 8.64

y = 1.08 + 0.54x

= 1.08 + 0.54(7.5) = 5.3

y = 1.08 + 0.54x

= 1.08 + 0.54(14.0) = 8.64

Some notes on prediction:

1. These are good predictions, but not necessarily correct!

2. Regression line is only good for predicting values in range forwhich we have data

I Best not to extrapolate, or predict values outside this rangeI 1995 state unemployment in our data ranges from 3% to

around 10%I Should we use regression equation to predict 2000

unemployment for state w/ 40% 1995 unemployment?

I Best not to extrapolate, or predict values outside this range

I 1995 state unemployment in our data ranges from 3% toaround 10%

I Should we use regression equation to predict 2000unemployment for state w/ 40% 1995 unemployment?

around 10%

I Should we use regression equation to predict 2000unemployment for state w/ 40% 1995 unemployment?

Using Regression for Hypothesis Tests of Slope

I Can also use OLS estimators in hypothesis testing framework

I Remember that for OLS we estimate the slope via:

∑(xi − x)(yi − y)∑

(xi − x)2

I and the intercept via:

b0 = y − b1x

I Both b1 and b0 are sums and means of random variables

I Means that CLT kicks in!

I → b1 and b0 are normally distributed in large samples!

∑(xi − x)(yi − y)∑

(xi − x)2

b0 = y − b1x

∑(xi − x)(yi − y)∑

(xi − x)2

b0 = y − b1x

∑(xi − x)(yi − y)∑

(xi − x)2

b0 = y − b1x

∑(xi − x)(yi − y)∑

(xi − x)2

b0 = y − b1x

∑(xi − x)(yi − y)∑

(xi − x)2

b0 = y − b1x

∑(xi − x)(yi − y)∑

(xi − x)2

b0 = y − b1x

∑(xi − x)(yi − y)∑

(xi − x)2

b0 = y − b1x

∑(xi − x)(yi − y)∑

(xi − x)2

b0 = y − b1x

I Can use this fact to conduct hypothesis tests, usuallytwo-tailed

I Specifically: If our slope β1 is zero, then no linear relationshipbetween the two variables

I Null and alternative hypotheses:I H0: β1 = 0I Ha: β1 6= 0

I Test statistic given by

tn−2 =b1 − 0

SE (b1)

I Where we use a t distribution and (usually) a two-tailed testand SE [b1]

SE (b1) =

√∑(yi − yi )2∑(xi − x)2

tn−2 =b1 − 0

SE (b1)

SE (b1) =

√∑(yi − yi )2∑(xi − x)2

tn−2 =b1 − 0

SE (b1)

SE (b1) =

√∑(yi − yi )2∑(xi − x)2

I Null and alternative hypotheses:

I H0: β1 = 0I Ha: β1 6= 0

tn−2 =b1 − 0

SE (b1)

SE (b1) =

√∑(yi − yi )2∑(xi − x)2

I Null and alternative hypotheses:I H0: β1 = 0

I Ha: β1 6= 0

tn−2 =b1 − 0

SE (b1)

SE (b1) =

√∑(yi − yi )2∑(xi − x)2

tn−2 =b1 − 0

SE (b1)

SE (b1) =

√∑(yi − yi )2∑(xi − x)2

tn−2 =b1 − 0

SE (b1)

SE (b1) =

√∑(yi − yi )2∑(xi − x)2

tn−2 =b1 − 0

SE (b1)

SE (b1) =

√∑(yi − yi )2∑(xi − x)2

tn−2 =b1 − 0

SE (b1)

SE (b1) =

√∑(yi − yi )2∑(xi − x)2

tn−2 =b1 − 0

SE (b1)

SE (b1) =

√∑(yi − yi )2∑(xi − x)2

I STATA and R report results of two-tailed hypothesis test:

-----------------------------------------------------

yr1995 | .5398317 .0818083 6.60 0.000

_cons | 1.077917 .4571589 2.36 0.026

----------------------------------------------------

I Note: For β1, hypothesis test yields p-value of < 0.001

I Note: Hypothesis test for β0 is testing null hypothesis that interceptequal to zero → that mean of y is zero when mean of x is zero

-----------------------------------------------------

yr1995 | .5398317 .0818083 6.60 0.000

_cons | 1.077917 .4571589 2.36 0.026

----------------------------------------------------

-----------------------------------------------------

yr1995 | .5398317 .0818083 6.60 0.000

_cons | 1.077917 .4571589 2.36 0.026

----------------------------------------------------

-----------------------------------------------------

yr1995 | .5398317 .0818083 6.60 0.000

_cons | 1.077917 .4571589 2.36 0.026

----------------------------------------------------

-----------------------------------------------------

yr1995 | .5398317 .0818083 6.60 0.000

_cons | 1.077917 .4571589 2.36 0.026

----------------------------------------------------

I Statistical interpretation?I Since p-value < 0.001, we can reject null hypothesis thatβ1 = 0 at an α = 0.05 level

I Substantive interpretation?I Strong evidence against the slope being zeroI Implies that there appears to be some relationship between

state unemployment rates in 1995 and in 2000

I In addition: Estimated slope suggests positive association →higher 1995 rate is linked w/higher 2000 rate

I Statistical interpretation?

I Since p-value < 0.001, we can reject null hypothesis thatβ1 = 0 at an α = 0.05 level

I Substantive interpretation?

I Strong evidence against the slope being zeroI Implies that there appears to be some relationship between

I Substantive interpretation?I Strong evidence against the slope being zero

I Implies that there appears to be some relationship betweenstate unemployment rates in 1995 and in 2000

Using Regression for Confidence Intervals of Slope

I Just as we can conduct hypothesis tests, can also constructconfidence intervals for true slope, β1

I Follows the same formula as before:

b1 ± tn−2(α/2)× SE [b1]

I In our example (w/30 observations):

0.5398± t28,α/2 × 0.0818

→ (0.372, 0.707)

I Interpretation: In repeated sampling, expect 95 out of 100confidence intervals to contain true slope

b1 ± tn−2(α/2)× SE [b1]

0.5398± t28,α/2 × 0.0818

→ (0.372, 0.707)

b1 ± tn−2(α/2)× SE [b1]

0.5398± t28,α/2 × 0.0818

→ (0.372, 0.707)

b1 ± tn−2(α/2)× SE [b1]

0.5398± t28,α/2 × 0.0818

→ (0.372, 0.707)

b1 ± tn−2(α/2)× SE [b1]

0.5398± t28,α/2 × 0.0818

→ (0.372, 0.707)

b1 ± tn−2(α/2)× SE [b1]

0.5398± t28,α/2 × 0.0818

→ (0.372, 0.707)

b1 ± tn−2(α/2)× SE [b1]

0.5398± t28,α/2 × 0.0818

→ (0.372, 0.707)

STATA and R will also report 95% CIs

Source | SS df MS Number of obs = 30

-------------+------------------------------ F( 1, 28) = 43.54

Model | 13.3338426 1 13.3338426 Prob > F = 0.0000

Residual | 8.57415592 28 .306219854 R-squared = 0.6086

-------------+------------------------------ Adj R-squared = 0.5947

Total | 21.9079986 29 .755448226 Root MSE = .55337

------------------------------------------------------------------------------

yr2000 | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

yr1995 | .5398317 .0818083 6.60 0.000 .372255 .7074084

_cons | 1.077917 .4571589 2.36 0.026 .1414697 2.014365

------------------------------------------------------------------------------

STATA and R will also report 95% CIs

Source | SS df MS Number of obs = 30

-------------+------------------------------ F( 1, 28) = 43.54

Model | 13.3338426 1 13.3338426 Prob > F = 0.0000

Residual | 8.57415592 28 .306219854 R-squared = 0.6086

-------------+------------------------------ Adj R-squared = 0.5947

Total | 21.9079986 29 .755448226 Root MSE = .55337

------------------------------------------------------------------------------

yr2000 | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

yr1995 | .5398317 .0818083 6.60 0.000 .372255 .7074084

_cons | 1.077917 .4571589 2.36 0.026 .1414697 2.014365

------------------------------------------------------------------------------

Model Fit of a Simple Linear Regression

I Model fit is a measure of how “well” the line fits the data

I In linear regression, R2 most commonly used measure

I R2: Proportion of variance in y explained by variance in x

I With one explanatory variable (one x), correlation coefficientr is square root of R2:

r =√R

I Here: √0.6086 = 0.780

I Substantive interpretation: High R2 → two variables highlycorrelated, regression explaining a lot of the variance in theoutcome

r =√R

I Here: √0.6086 = 0.780

r =√R

I Here: √0.6086 = 0.780

r =√R

I Here: √0.6086 = 0.780

r =√R

I Here: √0.6086 = 0.780

r =√R

I Here: √0.6086 = 0.780

r =√R

I Here: √0.6086 = 0.780

r =√R

I Here: √0.6086 = 0.780

Some Notes About Residuals

I Residuals represent estimates of the random errors, ε1I Empirically: Represent “left-over” distance from each

observation to regression line after fitting

I Differences observed in our sample data between each pointand regression line (vertically):

Residual = Observed y − Predicted y

I Least-squares line makes sum of the squared residuals as smallas possible

I Other strategies for drawing the line probably have biggervalues for this sum

I Residuals represent estimates of the random errors, ε1

I Empirically: Represent “left-over” distance from eachobservation to regression line after fitting

I Sum of residuals equals zero using least-squares regression

I → Plotting residuals against x values should result in plotthat looks random, i.e. no pattern present

I If pattern, a line might not be a good fit for the data

In Stata:

predict res, r

plot res yr1995

Some Notes About ResidualsIn Stata:

predict res, r

plot res yr1995

predict res, r

plot res yr1995

predict res, r

plot res yr1995

I Left hand side: Looks random

I Right hand side: Looks like errors get bigger with larger xvalues → heteroskedasticity

Outliers and Leverage Points

I Outlier: Observation that has an unusual y value, conditionalon x

I Leverage point: Observation that has an unusual x value (farfrom the mean of X )

I An observation is influential if it substantially changes theregression line → that is, it is an outlier and has high leverage

I Outlier, leverage points, and influential observations raiseinteresting questions to examine more

Warning about Association versus Causation

I Linear regression allows us under certain circumstances toI Make statements about whether a relationship between two

variables existsI Make statements about the size of that relationshipI Predict one variable using another

I However: At this point, not ok to say variable “causes”change in other variables → this requires additionalassumptions about the relationship between x and y

I You’ll visit the additional assumptions required to make causalstatements in API 202

I Linear regression allows us under certain circumstances to

I Make statements about whether a relationship between twovariables exists

I Make statements about the size of that relationshipI Predict one variable using another

variables exists

I Make statements about the size of that relationshipI Predict one variable using another

variables existsI Make statements about the size of that relationship

I Predict one variable using another

Next Time

I More on interpretation

I Multiple regression: regression with two or more explanatoryvariables

Next Time

Appendix: Proof of Least Squares Coefficient EstimatorsTaking the partial derivatives:

S(b0, b1) =n∑

(Yi − b0 − Xib1)2

n∑i=1

(Y 2i − 2Yib0 − 2Yib1Xi + b20 + 2b0b1Xi + b21X

∂S(b0, b1)

∂b0=

n∑i=1

(−2Yi + 2b0 + 2b1Xi )

= −2n∑

(Yi − b0 − b1Xi )

∂S(b0, b1)

∂b1=

n∑i=1

(−2YiXi + 2b0Xi + 2b1X2i )

= −2n∑

Xi (Yi − b0 − b1Xi )

Appendix: Proof of Least Squares Coefficient Estimators

I One condition of β0 and β1 minimizing the sum of thesquared residuals is that they must make the partialderivatives equal to 0

I Each of these conditions is called a first order condition.

I The first order conditions are:

0 = −2n∑

(Yi − β0 − β1Xi )

0 = −2n∑

Xi (Yi − β0 − β1Xi )

I Let’s solve for the estimator of the intercept first:

0 = −2n∑

(Yi − β0 − β1Xi )

n∑i=1

(Yi − β0 − β1Xi )

n∑i=1

Yi −

n∑i=1

β0 −

n∑i=1

β0n =

)− β1

)β0 = Y − β1X

Appendix: Proof of Least Squares Coefficient EstimatorsI Now, we can plug this back in to get an estimate for the slope:

0 = −2n∑

n∑i=1

Xi (Yi − (Y − β1X ) − β1Xi )

n∑i=1

Xi (Yi − Y − β1(Xi − X ))

n∑i=1

Xi (Yi − Y ) − β1

n∑i=1

Xi (Xi − X )

n∑i=1

Xi (Xi − X ) =

n∑i=1

Xi (Yi − Y ) − X∑i=1

(Yi − Y )

n∑i=1

Xi (Xi − X ) − X∑i=1

(Xi − X ) =

n∑i=1

(Xi − X )(Yi − Y )

n∑i=1

(Xi − X )2 =

n∑i=1

(Xi − X )(Yi − Y )

∑ni=1(Xi − X )(Yi − Y )∑n

i=1(Xi − X )2

I Note: We used a key fact about sums and means,∑ni=1(Yi − Y ) = 0

I Deviations from mean sum to 0

I Intuitively this makes sense because the mean is just the sumof observations divided by n

I Allows us to write∑n

i=1 Xi (Yi −Y ) =∑n

i=1(Xi −X )(Yi −Y )

lecture 20: simple linear regression api-201z · announcements i midterms nearly graded i executive...

Documents