regression analysis in theory and practice. don’t write the formulas ahead!!!

21
Regression Regression Analysis in Theory Analysis in Theory and Practice and Practice

Upload: ginger-long

Post on 21-Jan-2016

236 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!

Regression Analysis in Regression Analysis in Theory and PracticeTheory and Practice

Page 2: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!

DON’T WRITE THE DON’T WRITE THE FORMULAS AHEADFORMULAS AHEAD!!!!!!

Page 3: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!

REGRESSION ANALYSISREGRESSION ANALYSIS

Formula for simple regressionFormula for simple regression

where is the predicted value of Y on the where is the predicted value of Y on the regression line.regression line.

Do you remember y=mx + b?Do you remember y=mx + b?

Same thing!Same thing!

Y a bX Y

Page 4: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!

The dependence of Y on X can be of two types: “deterministic” or “probabilistic”.

The classic case of deterministic relationship is that between Fahrenheit and Celsius measure of temperature:

F0 = 32 + (9/5)C

Where a, the intercept, is 320. So when C=0, degrees F=32, b beta, is the slope of the line, here (9/5) or 1.8. C is X, degrees Celsius.

Y a bX

Page 5: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!

So for every on degree of change in degrees C, Fahrenheit goes up by 1.8 degrees, starting at 32 degrees.

So when C =0 F = 320 + (9/5)0 = 320

When C = 1000 F = 32 + (9/5)100=2120

Note: 1.8 = 9/5

Page 6: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!

Probabilistic Regression

Not perfectly predictive.Not perfectly predictive. On average, we expect a certain amount On average, we expect a certain amount

of change in Y for a certain change in Xof change in Y for a certain change in X

Page 7: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!

Regression ExampleRegression Example

Judges are advised to give longer Judges are advised to give longer sentences to repeat offenders than to first-sentences to repeat offenders than to first-time offenders. Does it really happen?time offenders. Does it really happen?

Hypothesis: In comparing criminals, those Hypothesis: In comparing criminals, those who illustrate the characteristic of having who illustrate the characteristic of having been convicted before will receive longer been convicted before will receive longer prison sentences than those with no prior prison sentences than those with no prior convictions.convictions.

We collect data for 10 convicted criminalsWe collect data for 10 convicted criminals

Page 8: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!

Data and Data and Formula:Formula:

X (convctn)X (convctn) y (sen len)y (sen len)

00 1212

33 1313

11 1515

00 1919

66 2626

55 2727

33 2929

44 3131

1010 4040

88 4848

ΣΣx = 40x = 40 ΣΣy = 260y = 260

2

( )( )

( )

X X Y Yb

X X

X = 4Y = 26

X – XX – X Y – YY – Y

-4-4 -14-14

-1-1 -13-13

-3-3 -11-11

-4-4 -7-7

22 00

11 11

-1-1 33

00 55

66 8484

44 8888

Page 9: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!

Continued:Continued:(X-X) * (Y-Y)(X-X) * (Y-Y) (X-X)(X-X)22

5656 1616

1313 11

3333 99

2828 1616

00 44

11 11

-3-3 11

55 00

1414 3636

2222 1616

ΣΣ = 300 = 300 ΣΣ= 100= 100

2

( )( )

( )

X X Y Yb

X X

X = 4Y = 26

300

100

b = 3

X – XX – X Y – YY – Y

-4-4 -14-14

-1-1 -13-13

-3-3 -11-11

-4-4 -7-7

22 00

11 11

-1-1 33

00 55

66 8484

44 8888

Page 10: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!

Now Calculate “A”Now Calculate “A”

a Y bX a = 26 – (3) * 4a = 26 – 12a = 14

Y = 14 + 3*X

Page 11: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!

Interpret the EquationInterpret the Equation

Y = 14 + 3*X

Interpret 14

Interpret 3

Page 12: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!

ScatterplotScatterplot

var1

var2 Fitted values

0 10

12

48

Page 13: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!
Page 14: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!
Page 15: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!
Page 16: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!

Multiple Regression - 1Multiple Regression - 1

The mathematics of how the computer The mathematics of how the computer calculates regression coefficients in multiple calculates regression coefficients in multiple regression is very complicated. Fortunately, regression is very complicated. Fortunately, there is an intuitive process that generates the there is an intuitive process that generates the correct answers and is much easier to correct answers and is much easier to understand. Let’s see how the computer understand. Let’s see how the computer obtained the value of -.644 for the impact of obtained the value of -.644 for the impact of senator conservatism on the degree to which a senator conservatism on the degree to which a senator voted for tax changes primarily senator voted for tax changes primarily benefitting households at, or below, the median benefitting households at, or below, the median income.income.

Page 17: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!

Multiple Regression - 2Multiple Regression - 2

Our “main equation” is:Our “main equation” is: Y = aY = a11 + b + b11XX11 + b + b22XX22 + b + b33XX33 + e + e11

Y = percentage support for tax changes Y = percentage support for tax changes benefitting households with incomes at, or benefitting households with incomes at, or below, the medianbelow, the median

XX1 1 = senator conservatism= senator conservatism

XX2 2 = senator party affiliation= senator party affiliation

XX3 3 = state median household income= state median household income

Our goal is to estimate bOur goal is to estimate b11

Page 18: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!

Multiple Regression - 3Multiple Regression - 3

XX11 = a = a22 + b + b44XX22 + b + b55XX3 3 + e+ e22

In the above equation eIn the above equation e22 represents that represents that

portion of a senator’s conservatism than portion of a senator’s conservatism than CANNOT be explained by either their CANNOT be explained by either their party affiliation or the median family party affiliation or the median family income in their state.income in their state.

Page 19: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!

Multiple Regression - 4Multiple Regression - 4

Y = aY = a33 + b + b66XX22 + b + b77XX3 3 + e+ e3 3

In the above equation eIn the above equation e33 represents that represents that

portion of a senator’s degree of support portion of a senator’s degree of support for tax changes favorable to households for tax changes favorable to households with incomes at, or below, the median that with incomes at, or below, the median that CANNOT be explained by either their CANNOT be explained by either their party affiliation or the median family party affiliation or the median family income in their state.income in their state.

Page 20: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!

Multiple Regression - 5Multiple Regression - 5

ee3 3 = a= a4 4 + b+ b88ee2 2 + e+ e44 In the above equation bIn the above equation b88 represents the impact of represents the impact of

that portion of a senator’s conservatism that that portion of a senator’s conservatism that CANNOT be explained by party and state median CANNOT be explained by party and state median income on the percentage of times the senator income on the percentage of times the senator voted in favor of tax changes primarily benefitting voted in favor of tax changes primarily benefitting households at, or below, the median income that households at, or below, the median income that CANNOT be explained by either their party CANNOT be explained by either their party affiliation or the median income in their state. affiliation or the median income in their state. Thus, bThus, b8 8 in the above equation = bin the above equation = b1 1 in the “main in the “main

equation” (i.e., -.644).equation” (i.e., -.644).

Page 21: Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!

Maximum Likelihood EstimationMaximum Likelihood Estimation