regression

26
Course Name More Description About the Course The World of Linear Regression

Upload: mandrewmartin

Post on 31-Oct-2014

8 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Regression

Course NameMore Description About the Course

The World of Linear Regression

Page 2: Regression

2

What is regression analysis?

Regression analysis is a technique for measuring the relationship between two interval- or ratio-level variables.

The regression framework is at the heart of empirical social and political science research.

Regression analysis acts as a statistical surrogate for controlled experiments, and can be used to make causal inferences.

Page 3: Regression

3

Regression models

Researchers translate verbal theories, hypotheses, even hunches into models.

A model shows how and under what conditions two (or more) variables are related.

A regression model with a dependent variable and one independent variable is known as a bivariate regression model.

A regression model with a dependent variable and two or more independent variables and/or control variables is known as a multivariate regression model.

Page 4: Regression

4

Scatterplots

A scatterplot graphs the sample observations by placing them along the X,Y axis.

The X axis generally represents the values of the independent variable, and the Y axis usually represents the value of the dependent variable.

X is the horizontal axis; Y is the vertical axis.

Page 5: Regression

5

Scatterplots

Scatterplots allow you to study the flow of the dots, or the relationship between the two variables

Scatterplots allow political scientists to identify-- positive or negative relationships

-- monotonic or linear relationships

Page 6: Regression

6

Scatterplot

Page 7: Regression

7

Page 8: Regression

8

Regression Equation

The linear equation is specified as follows:

Y = a + bX

Where Y = dependent variableX = independent variable

a = constant (value of Y when X = 0)b = is the slope of the regression line

Page 9: Regression

9

Regression Equation

Y = a + bX

a can be positive or negative. In high school algebra, you may have referred to a as the intercept. This is because a is the point at which the slope line passes through the Y axis.

b (the slope coefficient) can be positive or negative. A positive coefficient denotes a positive relationship and a negative coefficient denotes a negative relationship.

The substantive interpretation of the slope coefficient depends on the variables involved, how they are coded and the scale of the variables. Larger coefficients may indicate a stronger relationship, but not necessarily.

Page 10: Regression

10

The Regression Model

The goal of regression analysis is to find an equation which “best fits” the data.

In regression, an equation is found in such a way that its graph is a line that minimizes the squared vertical distances between the data points and the lines drawn.

Page 11: Regression

11

d1 and d2 represent the distances of observed data points from an estimated regression line.

Regression analysis uses a mathematical procedure that finds the single line that minimizes the squared distances from the line.

Page 12: Regression

12

Regression Equation

The standard regression equation is the same as the linear equation with one exception: the error term.

Y = α + βX + ε

Where Y = dependent variableα = constant term

β = slope or regression coefficientX = independent variable

ε = error term

Page 13: Regression

13

Regression Equation

This regression procedure is known as ordinary least squares (OLS).

α (the constant term) is interpreted the same as before

β (the regression coefficient) tells how much Y changes if X changes by one unit.

The regression coefficient indicates the direction and strength of the relationship between the two quantitative variables.

Page 14: Regression

14

Regression Equation

The error (ε) indicates that observed data do not follow a neat pattern that can be summarized with

a straight line.

A observation's score on Y can be broken into two parts:

α + βX is due to the independent variable

ε is due to error

Observed value = Predicted value (α + βX) + error (ε)

Page 15: Regression

15

Regression Equation

The error is the difference between the predicted value of Y and the observed value of Y.

This difference is known as the residual.

Page 16: Regression

16

Page 17: Regression

17

Page 18: Regression

18

Regression Interpretation

For the data on the scatterplot:

Y (depvar) = telephone lines for 1,000 peopleX (indvar) = Infant mortality

We can use regression analysis to examine the relationship between communication capacity (measured here as telephone lines per capita) and infant mortality.

Page 19: Regression

19

Regression Interpretation

In this analysis, the intercept and regression coefficient are as follows:

α (or constant) = 121 Means that when X (infant deaths) is 0 deaths, there are 121 phone lines per 1,000 population.

β = -1.25

Means that when X (deaths) increases by 1, there is a predicted or estimated decrease of 1.25 phone lines.

Page 20: Regression

20

Regression Interpretation

Page 21: Regression

21

Regression Interpretation

These calculations can be useful because they allow you to make useful predictions about the data. An increase from 1 to 10 deaths per 1,000 live births is

associated with a decline of 119.75 – 108.5 = 11.25 telephone lines.

Interpreting the meaning of a coefficient can be tricky. What does a coefficient of -1.25 mean?

-- Well, it means a negative relationship between infant mortality and phone lines.

-- It means for every additional infant death there is a decrease of 1.25 phone lines.

This information is useful, but is there a measure that tells us how good a job we do predicting the observed values?

Page 22: Regression

22

Scatterplot

Page 23: Regression

23

R-squared

Yes, the measure is known as R-squared (or R2).

As stated earlier, there are two component parts of the total deviation from the mean, which is usually measured as the sum of squares (or total variance).

The difference between the mean and the predicted value of Y. This is the explained part of the deviation, or (Regression Sum of Squares).

The second component is the residual sum of squares (Residual Sum of Squares), which measures prediction errors. The is the unexplained part of the deviation.

Page 24: Regression

24

R-squared

Total SS = Regression SS + Residual SS In other words, the total sum of squares is the sum of the regression sum of squares and the residual sum of squares.

R2 = Regression SS/TSS

The more variance the regression model explains, the higher the R2 .

Page 25: Regression

25

Page 26: Regression

26