10/03/[email protected] correlation scatter plots correlation coefficients significance test

50
10/03/2003 [email protected] Correlation Scatter Plots Correlation Coefficients Significance Test

Upload: sabrina-mcbride

Post on 25-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Correlation

Scatter Plots Correlation Coefficients Significance Test

Page 2: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Introduction

We are often asked to describe the relationship between two or more variables

Is there a relationship between points in the leaving cert and QCA

Is there a relationship between parents IQ and children's IQ

Page 3: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

What are Scatter Plots

Two dimensional plot showing the (X,Y) value for each observation

Used to determine whether there is any pronounced relationship and if so whether the relationship may be treated as approximately linear.

Y is usually the response (dependent) variable X is usually the explanatory (independent) variable The response variable is the variable whose variation we wish

to explain An explanatory variable is a variable used to explain variation

in the response variable

Page 4: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Positive Linear Relationship

Page 5: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Negative Linear Relationship

Page 6: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

No Linear Relationship

Page 7: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

No Relationship

Page 8: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Example 1

Two sets of exam results for 11 students Maths & Physics Are they related Does a good performance in Maths go with a

good performance in Physics Let the Maths mark be X Let the Physics mark be Y

Page 9: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Table of Results

X 41 37 38 39 49 47 42 34 36 48 29

Y 36 20 31 24 37 35 42 26 27 29 23

• X- Total is 440

• X-mean is 40

• Y-Total is 330

• Y-mean is 30

Page 10: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Maths Vs Physics

Page 11: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

What does the Graph tell us

The means divide the graph into four quadrants

Most of the data lies in the bottom left or top right quadrants

Only two fall outside these quadrants This indicates a probable relationship

between X and Y for a particular student

Page 12: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Correlation Coefficient From a diagram we get a general idea of the relationship. For precision we need a numerical measure. We need to measure the strength of the relationship The most common measure is the Pearson Product Moment Correlation Coefficient Usually known as the Correlation Coefficient We will usually be dealing with population samples The sample correlation coefficient is called r

Page 13: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Properties of r r can take values from -1 to +1 r = +1 or r = -1 represents a perfect linear correlation or

a perfect relationship between the variables r = 0 indicates little or no linear relationship i.e. as X

increases there is no definite tendency for the values of Y to increase or decrease in a straight line

r close to +1 indicates a large positive correlation i.e. Y tends to increase as X increases.

r close to -1 indicates a large negative correlation i.e. Y tends to decrease as X increases.

Further r differs from 0, the stronger the relationship. The sign of r indicates the direction of the relationship

Page 14: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Examples of various r values

r = +1 r = -1 r = -0.54

r = 0.70 r = 0 r = 0

Page 15: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

The formula for Calculating r

yyxx

xy

n

i

n

ii

i

n

i

n

ii

i

n

i

n

ii

n

ii

ii

SS

S

n

y

yn

x

x

n

yxyx

r

1

2

12

1

2

12

1

11

Page 16: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Example 2

Find the correlation coefficient r between Y and X

Subject A B C D E F G

X 1 3 5 7 9 11 13

Y 7 4 13 16 10 22 19

Page 17: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Create a table

Subject Xi Yi XiYi Xi squared Yi squared

A 1 7 7 1 49

B 3 4 12 9 16

C 5 13 65 25 169

D 7 16 112 49 256

E 9 10 90 81 100

F 11 22 242 121 484

G 13 19 247 169 361

Total 49 91 775 455 1435

Page 18: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Calculating Sxx

1127

49455

2

2

1

1

2

n

x

xS

n

iin

iixx

Page 19: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Calculating Syy

2527

911435

2

2

1

1

2

n

y

yS

n

iin

iiyy

Page 20: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Calculating Sxy

1387

914977511

1

n

yxyxS

n

ii

n

ii

i

n

iixy

Page 21: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Calculating r

82.0252112

138

yyxx

xy

SS

Sr

Page 22: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Significance Test

Ho: No Linear relationship exists r equal to 0 HA: There is a linear relationship r not equal to 0 Confidence Interval say 90%, 95%, 99% etc This means alpha = 0.1, 0.05, 0.01 etc Use table 10: Percentage points of the Correlation Coefficient Left hand column choose v = n-2 ( n = sample size) Find critical value If r > critical value then reject Ho

Page 23: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Conclusion

r = 0.82 let alpha = 0.05 v = n-2 giving v = 5 From tables the critical point is 0.7545 0.82 > 0.7545 We reject Ho and conclude: We are 95% confident that there is a linear

relationship between X and Y

Page 24: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Example 3

Is there an obvious relationship between X and Y

Y = X+2 This is a Perfect Relationship What will r be r will be equal to 1

X 3 4 5 6 7 8

Y 5 6 7 8 9 10

Page 25: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Set up the data table

Subject Y X XY X squared Y squared

A 5 3 15 9 25

B 6 4 24 16 36

C 7 5 35 25 49

D 8 6 48 36 64

E 9 7 63 49 81

F 10 8 80 64 100

Total 45 33 265 199 355

Page 26: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Calculate Sxx

5.176

33199

2

2

1

1

2

n

x

xS

n

iin

iixx

Page 27: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Calculate Syy

5.176

45355

2

2

1

1

2

n

y

yS

n

iin

iiyy

Page 28: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Calculate Sxy

5.176

4533265

n

i

n

i

n

iii

iixy n

yxyxS

Page 29: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Calculate r

15.17

5.17

5.175.17

5.17

yyxx

xy

SS

Sr

Perfect Positive Linear Relationship

Page 30: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Back to Example 1

In our original example with the student results we drew a scatter plot.

From the diagram it looked as if there was a probable positive linear relationship

To be sure we need to calculate r Using a significance level of alpha = 0.05 we

will test the claim that there is no linear correlation between Maths results and Physics results

Page 31: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Create a data table

Student X Y XY X squared Y squared

A 41 36 1476 1681 1296

B 37 20 740 1369 400

C 38 31 1178 1444 961

D 39 24 936 1521 576

E 49 37 1813 2401 1369

F 47 35 1645 2209 1225

G 42 42 1764 1764 1764

H 34 26 884 1156 676

I 36 27 972 1296 729

J 48 29 1392 2304 841

K 29 23 667 841 529

Total 440 330 13467 17986 10366

Page 32: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Apply the formulae

26711

33044013467

46611

33010366

38611

44017986

2

2

xy

yy

xx

S

S

S

Page 33: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Correlation Coefficient is

63.012.424

267

466386

267

yyxx

xy

SS

Sr

92

5.0

vnv

Page 34: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Conclusion

From the tables the critical point is 0.6021 r = 0.63 0.63 > 0.6201 We Reject the claim and conclude that There is a Positive Linear Relationship

between results in Maths and results in Physics

Page 35: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Regression

Least Squares Predicting Y using X

Page 36: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

What is Regression?

Regression Analysis is used for prediction It allows us to predict the value of one

variable given the value of another variable It gives us an equation that uses one variable

to help explain variation in another In this course we deal with Simple Linear

Regression

Page 37: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Simple Linear Regression

First step in determining a relationship was drawing a scatter plot

If a possible relationship was shown we found the strength of the relationship by calculating the correlation coefficient r

The next stage is to calculate an equation which best describes the relationship between the two variables

This line is called the Regression Line

Page 38: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

What is the ‘best fit’ lineExample 1

Page 39: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

‘Least Squares’ best fit line

We can have several lines of the form

ii bxay

We want ‘best’ least residuals

ii xbay ˆˆˆ

Page 40: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Least Squares estimates

ba ˆ,ˆ are the least squares estimates of ba,

xx

xy

S

Sb ˆ Closely related to r

xbya ˆˆ

Page 41: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Example 2

1387

914977511

1

n

yxyxS

n

ii

n

ii

i

n

iixy

1127

49455

2

2

1

1

2

n

x

xS

n

iin

iixx

Page 42: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Combining we get

23.1112

138ˆ xx

xy

S

Sb

39.47

4923.1

7

91ˆˆ xbya

Page 43: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Regression line is

ii xy 23.139.4ˆ

Page 44: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Example 3

X 3 4 5 6 7 8

Y 5 6 7 8 9 10

We know Y=X+2

Page 45: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Verifying the equation is correct

5.176

4533265

n

i

n

i

n

iii

iixy n

yxyxS

5.176

33199

2

2

1

1

2

n

x

xS

n

iin

iixx

Page 46: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Giving1

1

1ˆ xx

xy

S

Sb

26

331

6

45ˆˆ xbya

ii xy 2ˆ

Page 47: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Example 1

26711

33044013467

38611

44017986

2

xy

xx

S

S

Page 48: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Regression line

692.0386

267ˆ xx

xy

S

Sb

32.211

440692.0

11

330ˆˆ xbya

ii xy 692.032.2ˆ

Page 49: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Example 1 continued

If a student received a grade of 53 in Maths, what would the expected grade be in Physics

We use the Regression line in order to predict the Physics result

996.3853692.032.2ˆ

692.032.2ˆ

y

xy ii

Page 50: 10/03/2003Gerard.Golding@ul.ie Correlation Scatter Plots Correlation Coefficients Significance Test

10/03/2003 [email protected]

Graphing The Regression Line