regression correlation background defines relationship between two variables x and y r ranges from...

45
Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation) +1 (perfect positive correlation) R=.689

Upload: bruce-newton

Post on 03-Jan-2016

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Regression

Correlation BackgroundDefines relationship between two

variables X and YR ranges from

-1 (perfect negative correlation)0 (No correlation)

+1 (perfect positive correlation)

R=.689

Page 2: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Regression

Correlation BackgroundR2 Indicates reduction in error knowing X and Predicting Y R2 ranges from 0 (No reduction in error)1 (complete reduction in error)

R2=.474

Page 3: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Regression

ExamplesPredicting height from G.P.A.

R2 = 0 (Knowing height does not help predict G.P.A – best guess is always mean G.P.A.)

R2 = 1 (Knowing height in CM completely predicts height in Inches)

Page 4: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Regression

Real world examples are somewhere in between

Predicting height from weightR2 = .36 (Knowing height

somewhat helps predict weight)

Page 5: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Regression

But how do we figure out HOW to make that prediction given one of the variables?

Page 6: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Regression

Need background concept of slope

How much does Y change for a given change in X?

All lines have R=1

0

2

4

6

8

10

12

14

16

18

20

0 1 2 3 4 5 6 7 8 9

Y=X

Y=2X

Y=X/2

Page 7: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Regression

-20

-15

-10

-5

0

5

10

15

20

0 1 2 3 4 5 6 7 8 9

Y=-X

Y=-2X

Y=-X/2

All lines have R=-1

Page 8: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Regression

Need background concept of INTERCEPT

What is Y when X=0?

All lines have Same Slope but different intercept

-5

0

5

10

15

20

25

0 1 2 3 4 5 6 7 8 9

Y=2XY=2X+5Y=2X-3

Page 9: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Regression

Unique line is defined by Slope and Y-Intercept

Y=bX+a

b=slopea=Y-Interecpt

-7

-4

-1

2

5

8

11

14

17

20

0 1 2 3 4 5 6 7 8 9

Y=?x+?Y=?x+?Y=?x+?

Page 10: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Regression

Predicting depression from loneliness

Y= BDI Depression X= Loneliness

Y=2X+2

-7

-4

-1

2

5

8

11

14

17

20

0 1 2 3 4 5 6 7 8 9

Y=?x+?Y=?x+?Y=?x+?

Page 11: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Regression

Predicted vs. Actual R=1, R2=1No Error

Never happens like this in real world

-1

2

5

8

11

14

17

20

0 1 2 3 4 5 6 7 8 9

ActualDepressionscore

PredictedDepressionscore

Page 12: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Actual scores don’t fit on a line perfectly

Actual scores

0

3

6

9

12

15

18

21

24

27

1 2 3 4 5 6 7 8 9 10

Actual scores

Page 13: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Some possible solutions?Error is

Sum of (Predicted Y-Actual Y)2

0

3

6

9

12

15

18

21

24

27

0 1 2 3 4 5 6 7 8 9

Y=2x+4 (Error=50)

Y=1.5X+6(Error=85.25)

Actual scores

Page 14: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Where is the line with smallest error?

Least Squares Regression Line

Actual scores

0

3

6

9

12

15

18

21

24

27

1 2 3 4 5 6 7 8 9 10

Actual scores

Page 15: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Where is the line with smallest error?

Least Squares Regression Line

Calc slope=b=

Σ (X-X)(Y-Y)

----------------------------------------------------------

Σ (X-X)(X-X)

=2.13 with this data

Page 16: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Where is the line with smallest error?

Least Squares Regression Line

Calc y intercept = a Y- (b)(X)

=4 with this data

So Least squares regression line isY=2.13X+4

Page 17: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Where is the line with smallest error?

Least Squares Regression Line

0

3

6

9

12

15

18

21

24

27

1 2 3 4 5 6 7 8 9 10

Actual scores

Y=2.133X+4

Page 18: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

How good is our prediction?Sum of (Predicted Y-Actual Y)2

X Score Actual Y score Predicted Y score Squared Error

0 5 4.00 1.00

1 7 6.13 0.75

2 8 8.27 0.07

3 11 10.40 0.36

4 8 12.53 20.55

5 15 14.67 0.11

6 17 16.80 0.04

7 22 18.93 9.40

8 18 21.07 9.40

9 25 23.20 3.24

4.5 13.6 44.93

Page 19: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Can we standardize this for an average Error?

Yes: Standard error of the estimate

Like a standard deviation

Gives average precition error per score

Standard error of the estimate = SQRT(SSresidual/Npairs-2)

In this example = SQRT(44.9/10-2)=SQRT(44.9/8)=2.36

Page 20: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Chi-square (χ2)

Non Parametric Statistical tests

Used fornominal data (categories)ordinal (ordered categories)non-normal interval/ratio data

Page 21: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Goodness of fit χ2 Used with nominal dataTests a DISTRIBUTION (not a mean)Sees if observed data FITS an expected distribution

H0=true frequency distribution is expected

H1=true frequency distribution has some other form

Page 22: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

VEGAS BABY!!!

Rolling dice at the MirageLots of Snake Eyes coming up Are the dice fixed?Test with goodness of fitDoes our distribution FIT the expected distribution

Page 23: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

VEGAS BABY!!!

Expected distribution for 120 rolls if fair:

Each die(dice) has 1/6 chance

1/6 X 120 = 20 of each type

Expected Distribution =[20,20,20,20,20,20]

Page 24: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

VEGAS BABY!!!

Actual distribution for 120 rolls is:

[28,16,23,23,17,13]

Are these dice fair?

Use Goodness of fit χ2

Page 25: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

VEGAS BABY!!!

Determine critical χ2 value:

df = number of categories – 1= 6-1 = 5

χ2 critical for df=5 is 11.07 from table

Page 26: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)
Page 27: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)
Page 28: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)
Page 29: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)
Page 30: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Cat Oi Ei (Oi-Ei) (Oi-Ei) 2 (Oi-Ei) 2 / Ei

1 28 20 8 64 3.2

2 16 20 -4 16 0.8

3 23 20 3 9 0.45

4 23 20 3 9 0.45

5 17 20 -3 9 0.45

6 13 20 -7 49 2.45

Σ 120 120 0 7.8

FAIR!!!

Page 31: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Cat Oi Ei (Oi-Ei) (Oi-Ei) 2 (Oi-Ei) 2 / Ei

1 56 40 16 256 6.4

2 32 40 -8 64 1.6

3 46 40 6 36 0.9

4 46 40 6 36 0.9

5 34 40 -6 36 0.9

6 26 40 -14 196 4.9

Σ 240 240 0 15.6

CHEAT!!!

Page 32: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Test of independence χ2

Used with nominal dataTests whether DISTRIBUTION 1 is dependent upon DISTRIBUTION 2

H0= Distribution 1 is independent of Distribution 2

H1= Distribution 1 is related to Distribution 2

Page 33: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Example: Are Men more likely to have supported was in IRAQ

100 Subjects (50 male, 50 female)Asked yes or no question about supporting war

in Iraq

H0= Gender does not affect likelihood of supporting war

H1= Gender does affect likelihood of supporting war

Page 34: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Determine critical Value

Df = (R-1) (C-1)

Df = (Category 1 Size -1) size X Category 2 Size -1)

= (2-1) X (2-1) = 1 X 1 = 1Critical Value from A-3 is 3.84

Page 35: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Set up Data

Males Females TotalSupport war 32 21 53Not support war 18 29 47

Total 50 50 100

Page 36: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Set up Data

Males Females TotalSupport war 32 (26.5) 21(26.5) 53 Not support war 18 (23.5) 29(23.5) 47

Total 50 50 100

Page 37: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Category Oi Ei (Oi-Ei) (Oi-Ei) 2 (Oi-Ei) 2 / Ei

M/S 32 26.5 5.5 30.3 1.14

M/N 18 23.5 -5.5 30.3 1.29

F/S 21 26.5 -5.5 30.3 1.14

F/N 29 23.5 5.5 30.3 1.29

Σ 100 100 0 4.86

Calculate observed χ2

Page 38: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Test observed against critical

observed χ2 = 4.86 critical χ2 = 3.84

So we reject the idea that gender does not affect support of war and conclude

Gender DOES affect support of war

Page 39: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

McNemar test for significance of change

Used with nominal dataTests whether DISTRIBUTION 1 is dependent upon DISTRIBUTION 2

Same as test of dependence but uses SAME person to test nominal data before and after some event

Page 40: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Example: Are Men more likely to have supported was in IRAQ

100 Subjects Do you favor the pledge allegiance?Before and After terrorist attacks

H0= proportion of individuals supporting pledge before attacks is same as after attacks

H1= proportion of individuals supporting pledge before attacks is different after attacks

Page 41: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Determine critical Value

Df = 1 for all McNemar testsCritical Value is 3.84

Page 42: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Set up Data Before AttacksYes No Total

After Attacks Yes 33 20 53No 9 38 47

Total 42 58 100

Page 43: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Set up Data Before AttacksYes No Total

After Attacks Yes 33 20 (14.5) 53

No 9 (14.5) 38 47

Total 42 58

Page 44: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Category Oi Ei (Oi-Ei) (Oi-Ei) 2 (Oi-Ei) 2 / Ei

1 9 14.5 -5.5 30.3 2.09

2 20 14.5 5.5 30.3 2.09

Σ 29 29 0 4.17

Calculate observed χ2

Page 45: Regression Correlation Background Defines relationship between two variables X and Y R ranges from -1 (perfect negative correlation) 0 (No correlation)

Test observed against critical

observed χ2 = 4.71 critical χ2 = 3.84

So we reject the idea that the proportions are the same

Conclusion: Attacks did change the proportion who support pledge of allegiance