10/03/2003gerard.golding@ul.ie correlation scatter plots correlation coefficients significance test

Post on 25-Dec-2015

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

10/03/2003 Gerard.Golding@ul.ie

Correlation

Scatter Plots Correlation Coefficients Significance Test

10/03/2003 Gerard.Golding@ul.ie

Introduction

We are often asked to describe the relationship between two or more variables

Is there a relationship between points in the leaving cert and QCA

Is there a relationship between parents IQ and children's IQ

10/03/2003 Gerard.Golding@ul.ie

What are Scatter Plots

Two dimensional plot showing the (X,Y) value for each observation

Used to determine whether there is any pronounced relationship and if so whether the relationship may be treated as approximately linear.

Y is usually the response (dependent) variable X is usually the explanatory (independent) variable The response variable is the variable whose variation we wish

to explain An explanatory variable is a variable used to explain variation

in the response variable

10/03/2003 Gerard.Golding@ul.ie

Positive Linear Relationship

10/03/2003 Gerard.Golding@ul.ie

Negative Linear Relationship

10/03/2003 Gerard.Golding@ul.ie

No Linear Relationship

10/03/2003 Gerard.Golding@ul.ie

No Relationship

10/03/2003 Gerard.Golding@ul.ie

Example 1

Two sets of exam results for 11 students Maths & Physics Are they related Does a good performance in Maths go with a

good performance in Physics Let the Maths mark be X Let the Physics mark be Y

10/03/2003 Gerard.Golding@ul.ie

Table of Results

X 41 37 38 39 49 47 42 34 36 48 29

Y 36 20 31 24 37 35 42 26 27 29 23

• X- Total is 440

• X-mean is 40

• Y-Total is 330

• Y-mean is 30

10/03/2003 Gerard.Golding@ul.ie

Maths Vs Physics

10/03/2003 Gerard.Golding@ul.ie

What does the Graph tell us

The means divide the graph into four quadrants

Most of the data lies in the bottom left or top right quadrants

Only two fall outside these quadrants This indicates a probable relationship

between X and Y for a particular student

10/03/2003 Gerard.Golding@ul.ie

Correlation Coefficient From a diagram we get a general idea of the relationship. For precision we need a numerical measure. We need to measure the strength of the relationship The most common measure is the Pearson Product Moment Correlation Coefficient Usually known as the Correlation Coefficient We will usually be dealing with population samples The sample correlation coefficient is called r

10/03/2003 Gerard.Golding@ul.ie

Properties of r r can take values from -1 to +1 r = +1 or r = -1 represents a perfect linear correlation or

a perfect relationship between the variables r = 0 indicates little or no linear relationship i.e. as X

increases there is no definite tendency for the values of Y to increase or decrease in a straight line

r close to +1 indicates a large positive correlation i.e. Y tends to increase as X increases.

r close to -1 indicates a large negative correlation i.e. Y tends to decrease as X increases.

Further r differs from 0, the stronger the relationship. The sign of r indicates the direction of the relationship

10/03/2003 Gerard.Golding@ul.ie

Examples of various r values

r = +1 r = -1 r = -0.54

r = 0.70 r = 0 r = 0

10/03/2003 Gerard.Golding@ul.ie

The formula for Calculating r

yyxx

xy

n

i

n

ii

i

n

i

n

ii

i

n

i

n

ii

n

ii

ii

SS

S

n

y

yn

x

x

n

yxyx

r

1

2

12

1

2

12

1

11

10/03/2003 Gerard.Golding@ul.ie

Example 2

Find the correlation coefficient r between Y and X

Subject A B C D E F G

X 1 3 5 7 9 11 13

Y 7 4 13 16 10 22 19

10/03/2003 Gerard.Golding@ul.ie

Create a table

Subject Xi Yi XiYi Xi squared Yi squared

A 1 7 7 1 49

B 3 4 12 9 16

C 5 13 65 25 169

D 7 16 112 49 256

E 9 10 90 81 100

F 11 22 242 121 484

G 13 19 247 169 361

Total 49 91 775 455 1435

10/03/2003 Gerard.Golding@ul.ie

Calculating Sxx

1127

49455

2

2

1

1

2

n

x

xS

n

iin

iixx

10/03/2003 Gerard.Golding@ul.ie

Calculating Syy

2527

911435

2

2

1

1

2

n

y

yS

n

iin

iiyy

10/03/2003 Gerard.Golding@ul.ie

Calculating Sxy

1387

914977511

1

n

yxyxS

n

ii

n

ii

i

n

iixy

10/03/2003 Gerard.Golding@ul.ie

Calculating r

82.0252112

138

yyxx

xy

SS

Sr

10/03/2003 Gerard.Golding@ul.ie

Significance Test

Ho: No Linear relationship exists r equal to 0 HA: There is a linear relationship r not equal to 0 Confidence Interval say 90%, 95%, 99% etc This means alpha = 0.1, 0.05, 0.01 etc Use table 10: Percentage points of the Correlation Coefficient Left hand column choose v = n-2 ( n = sample size) Find critical value If r > critical value then reject Ho

10/03/2003 Gerard.Golding@ul.ie

Conclusion

r = 0.82 let alpha = 0.05 v = n-2 giving v = 5 From tables the critical point is 0.7545 0.82 > 0.7545 We reject Ho and conclude: We are 95% confident that there is a linear

relationship between X and Y

10/03/2003 Gerard.Golding@ul.ie

Example 3

Is there an obvious relationship between X and Y

Y = X+2 This is a Perfect Relationship What will r be r will be equal to 1

X 3 4 5 6 7 8

Y 5 6 7 8 9 10

10/03/2003 Gerard.Golding@ul.ie

Set up the data table

Subject Y X XY X squared Y squared

A 5 3 15 9 25

B 6 4 24 16 36

C 7 5 35 25 49

D 8 6 48 36 64

E 9 7 63 49 81

F 10 8 80 64 100

Total 45 33 265 199 355

10/03/2003 Gerard.Golding@ul.ie

Calculate Sxx

5.176

33199

2

2

1

1

2

n

x

xS

n

iin

iixx

10/03/2003 Gerard.Golding@ul.ie

Calculate Syy

5.176

45355

2

2

1

1

2

n

y

yS

n

iin

iiyy

10/03/2003 Gerard.Golding@ul.ie

Calculate Sxy

5.176

4533265

n

i

n

i

n

iii

iixy n

yxyxS

10/03/2003 Gerard.Golding@ul.ie

Calculate r

15.17

5.17

5.175.17

5.17

yyxx

xy

SS

Sr

Perfect Positive Linear Relationship

10/03/2003 Gerard.Golding@ul.ie

Back to Example 1

In our original example with the student results we drew a scatter plot.

From the diagram it looked as if there was a probable positive linear relationship

To be sure we need to calculate r Using a significance level of alpha = 0.05 we

will test the claim that there is no linear correlation between Maths results and Physics results

10/03/2003 Gerard.Golding@ul.ie

Create a data table

Student X Y XY X squared Y squared

A 41 36 1476 1681 1296

B 37 20 740 1369 400

C 38 31 1178 1444 961

D 39 24 936 1521 576

E 49 37 1813 2401 1369

F 47 35 1645 2209 1225

G 42 42 1764 1764 1764

H 34 26 884 1156 676

I 36 27 972 1296 729

J 48 29 1392 2304 841

K 29 23 667 841 529

Total 440 330 13467 17986 10366

10/03/2003 Gerard.Golding@ul.ie

Apply the formulae

26711

33044013467

46611

33010366

38611

44017986

2

2

xy

yy

xx

S

S

S

10/03/2003 Gerard.Golding@ul.ie

Correlation Coefficient is

63.012.424

267

466386

267

yyxx

xy

SS

Sr

92

5.0

vnv

10/03/2003 Gerard.Golding@ul.ie

Conclusion

From the tables the critical point is 0.6021 r = 0.63 0.63 > 0.6201 We Reject the claim and conclude that There is a Positive Linear Relationship

between results in Maths and results in Physics

10/03/2003 Gerard.Golding@ul.ie

Regression

Least Squares Predicting Y using X

10/03/2003 Gerard.Golding@ul.ie

What is Regression?

Regression Analysis is used for prediction It allows us to predict the value of one

variable given the value of another variable It gives us an equation that uses one variable

to help explain variation in another In this course we deal with Simple Linear

Regression

10/03/2003 Gerard.Golding@ul.ie

Simple Linear Regression

First step in determining a relationship was drawing a scatter plot

If a possible relationship was shown we found the strength of the relationship by calculating the correlation coefficient r

The next stage is to calculate an equation which best describes the relationship between the two variables

This line is called the Regression Line

10/03/2003 Gerard.Golding@ul.ie

What is the ‘best fit’ lineExample 1

10/03/2003 Gerard.Golding@ul.ie

‘Least Squares’ best fit line

We can have several lines of the form

ii bxay

We want ‘best’ least residuals

ii xbay ˆˆˆ

10/03/2003 Gerard.Golding@ul.ie

Least Squares estimates

ba ˆ,ˆ are the least squares estimates of ba,

xx

xy

S

Sb ˆ Closely related to r

xbya ˆˆ

10/03/2003 Gerard.Golding@ul.ie

Example 2

1387

914977511

1

n

yxyxS

n

ii

n

ii

i

n

iixy

1127

49455

2

2

1

1

2

n

x

xS

n

iin

iixx

10/03/2003 Gerard.Golding@ul.ie

Combining we get

23.1112

138ˆ xx

xy

S

Sb

39.47

4923.1

7

91ˆˆ xbya

10/03/2003 Gerard.Golding@ul.ie

Regression line is

ii xy 23.139.4ˆ

10/03/2003 Gerard.Golding@ul.ie

Example 3

X 3 4 5 6 7 8

Y 5 6 7 8 9 10

We know Y=X+2

10/03/2003 Gerard.Golding@ul.ie

Verifying the equation is correct

5.176

4533265

n

i

n

i

n

iii

iixy n

yxyxS

5.176

33199

2

2

1

1

2

n

x

xS

n

iin

iixx

10/03/2003 Gerard.Golding@ul.ie

Giving1

1

1ˆ xx

xy

S

Sb

26

331

6

45ˆˆ xbya

ii xy 2ˆ

10/03/2003 Gerard.Golding@ul.ie

Example 1

26711

33044013467

38611

44017986

2

xy

xx

S

S

10/03/2003 Gerard.Golding@ul.ie

Regression line

692.0386

267ˆ xx

xy

S

Sb

32.211

440692.0

11

330ˆˆ xbya

ii xy 692.032.2ˆ

10/03/2003 Gerard.Golding@ul.ie

Example 1 continued

If a student received a grade of 53 in Maths, what would the expected grade be in Physics

We use the Regression line in order to predict the Physics result

996.3853692.032.2ˆ

692.032.2ˆ

y

xy ii

10/03/2003 Gerard.Golding@ul.ie

Graphing The Regression Line

top related