correlation. bivariate distribution observations are taken on two variables two characteristics are...

21
CORRELATION

Upload: chester-dawson

Post on 16-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

CORRELATION

Page 2: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

Bivariate Distribution

Observations are taken on two variables

Two characteristics are measured on n individuals

e.g : The height (x) and weight (y) of 10 students

A single characteristic is measured on two groups of individuals

e.g : The height of 10 males (x) and 10 females (y)

),(),...,,(),,( 2211 nn yxyxyx

Page 3: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

Height Self-esteem

68 4.1

71 4.6

62 3.8

75 4.4

58 3.2

60 3.1

67 3.8

68 4.1

71 4.3

69 3.7

68 3.5

67 3.2

63 3.7

62 3.3

60 3.4

63 4

65 4.1

67 3.8

63 3.4

61 3.6

Page 4: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

Definition

Correlation is used to measure and describe a relationship/association between two variables

A single number which describes the relationship between X and Y is the correlation coefficient. Denoted by ‘r’ or ‘ρ’.

Page 5: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

Scatter Diagram

scatter plot

0

1

2

3

4

5

50 55 60 65 70 75 80

Height

Sel

f Est

eem

scatter plot

Page 6: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

Education Level and Lifetime Earnings

0

1

2

3

4

5

0 2 4 6 8 10

Education (Predictor Variable)

Life

time

Earn

ings

(C

riter

ion

Varia

ble)

X (Education) Y (Income)8 3.47 4.46 2.55 2.14 1.63 1.52 1.21 1

What is the relationship between level of education and lifetime earnings?

Page 7: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

Direction of Relationship

A scatter plot shows at a glance the direction of the relationship. A positive correlation indicates a directly

proportional relationship.

Page 8: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

Direction of Relationship

A negative correlation indicates an inversely proportional relationship

Page 9: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

No Correlation

In cases where there is no correlation between two variables, the dots are scattered about the plot in an irregular pattern.

Page 10: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

Correlation Coefficient

The correlation coefficient measures three characteristics of the relationship between X and Y: The direction of the relationship. The form of the relationship. The degree of the relationship

Page 11: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

Karl Pearson Correlation

Page 12: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

Calculation

Calculate the KP Correlation for data in slide 3.

Ans: 0.73 Interpretation: The data exhibits a strong

positive correlation indicating that self-esteem increases with height.

Page 13: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

X Education Y Income XY X2 Y2

8 3.4 27.2 64 11.567 4.4 30.8 49 19.366 2.5 15 36 6.255 2.1 10.5 25 4.414 1.6 6.4 16 2.563 1.5 4.5 9 2.252 1.2 2.4 4 1.441 1 1 1 1

36 17.7 97.8 204 48.83

8

83.48

204

8.97

7.17

36

2

2

n

Y

X

XY

Y

X

Page 14: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

The data shows a high positive correlation between income and education.

Page 15: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

Drawbacks

Presence of outliers Nonlinear scatter plot of x and y values. In the next slide scatter plots are shown for 7

different datasets that have the same correlation r=0.70. Is the use of r justified in each case?

Page 16: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and
Page 17: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

Rank Correlation

Age (mths)

Stopping distance

Age rank Stopping rank

d d2

9 28.4 1 1 0 0

15 29.3 2 2 0 0

24 37.6 3 7 4 16

30 36.2 4 4.5 0.5 0.25

38 36.5 5 6 1 1

46 35.3 6 3 -3 9

53 36.2 7 4.5 -2.5 6.25

60 44.1 8 8 0 0

64 44.8 9 9 0 0

76 47.2 10 10 0 0

        32.5

Page 18: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

Scatter Plot

       

Page 19: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

Calculations

Number in sample (n) = 10r = 1 - (195 / 10 x 99)r = 1 - 0.197r = 0.803 )1(

61

21

2

nn

dr

n

ii

Page 20: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

Probable Error

n

rrEP

216745.0)(.

If r>6P.E, then correlation is highly significant in the population, otherwise it is insignificant.

Page 21: CORRELATION. Bivariate Distribution Observations are taken on two variables Two characteristics are measured on n individuals e.g : The height (x) and

Caution

Correlation does not imply causation. Example: Average temperature (x) in a month

and number of ice cream vendors (y). r=0.92 (Highly positive)