correlation and regression scatter diagram the simplest method to assess relationship between two...

20
Correlation and Regression

Upload: clement-chambers

Post on 31-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

Correlation and Regression

Page 2: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

SCATTER DIAGRAMSCATTER DIAGRAM

The simplest method to assess relationship between two

quantitative variables is to draw a scatter diagram

From this diagram we notice that as age increases there is a

general tendency for the BP to increase. But this does not

give us a quantitative estimate of the degree of the relationship

Page 3: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

CORRELATION COEFFICIENTCORRELATION COEFFICIENT

The correlation coefficient is an index of the degree of index of the degree of

associationassociation between two variables. It can also be used for

comparing the degree of association in different groups

For example, we may be interested in knowing whether the degree of

association between age and systolic BP is the same (or different) in

males and females

The correlation coefficient is denoted by the symbol ‘r’‘r’

‘ ‘r’ ranges from -1 to +1r’ ranges from -1 to +1

Page 4: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

High values of one variable tend to occur with high

values of the other (and low with low)

In such situations, we say that there is a positive correlationpositive correlation

High values of one variable occur with low values of the other

(and vice-versa)

we say that there is a negative correlationnegative correlation

Page 5: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

A NOTE OF CAUTIONA NOTE OF CAUTION

Correlation coefficient is purely a measure of degree of

association and does notdoes not provide any evidence of

a cause-effect relationship

It is valid only in the range of values studied

Extrapolation of the association may not always be valid

Eg.: Age & Grip strength

Page 6: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

r measures the degree of linear relationship

r = 0 does not necessarily mean that there is no relationship between the two characteristics under study; the relationship could be curvilinear

Spurious correlationSpurious correlation : :

The production of steel in UK and population in India

over the last 25 years may be highly correlated

Page 7: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

r does not give the rate of change in one variable

for changes in the other variable

Eg: Age & Systolic BP - Males : r = 0.7

Females : r = 0.5

From this one should not conclude that Systolic BP increases

at a higher rate among males than females

Page 8: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

PROPERTY OFPROPERTY OF CORRELATION COEFFICIENTCORRELATION COEFFICIENT

Correlation coefficient is unaffected by addition / subtraction

of a constant or multiplication / division by a constant to all the

values of X and Y

Corr. Coeff. between X & Y = 0.7

,, X+10 & Y-6 = 0.7

,, 5X & 2Y = 0.7

If the correlation coefficient between height in inches and

weight in pounds is say, 0.6, the correlation coefficient

between

height in cm and weight on kg will also be 0.6

Page 9: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

COMPUTATION OF THE COMPUTATION OF THE CORRELATION COEFFICIENTCORRELATION COEFFICIENT

Covariance (XY)

X Y (X - X) (Y- Y) (X –X) (Y- Y) 8 12 1 0 0 3 9 -4 -3 12 4 10 -3 -2 6 10 15 3 3 9 6 11 -1 -1 1 7 12 0 0 0 11 15 4 3 12 49 84 0 0 40

Sum

7nx

x 12

ny

y

67.6640

)1())((

nyyxx

98.031.294.2

67.6).(.).(.

)( XydSxdS

xyCovr

n = 7 n = 7

Page 10: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

UNIVARIATE REGRESSIONUNIVARIATE REGRESSION

Regression : Method of describing the relationship

between two variables

Use : To predict the value of one variable given the other

Page 11: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

SAMPLE DATA SETSAMPLE DATA SET Patient No. Age (X) Sys BP (Y)

1 45 1502 48 1533 46 1484 45 1505 46 1476 48 1537 46 1498 55 1599 51 15710 56 16011 53 15812 60 16513 53 15714 54 15815 49 154

BP = Response (dependent) variable; Age = Predicator (independent) variableBP = Response (dependent) variable; Age = Predicator (independent) variable

Page 12: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

REGRESSION MODELREGRESSION MODEL

We can perform a “regression of BP on age”,

to derive a straight line that gives an estimated value of BP

for any given age.

The general equation of a linear regression line is

Y = a + bX + e Y = a + bX + e

Where, a = Intercept

b = Regression coefficient

e = Statistical error

Page 13: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

CALCULATIONSCALCULATIONS

Estimated from the observed values of

Age (X) and BP (Y) by least square method

b gives the change in Y for a unit change in X

a is the value of Y when X = 0, which may not be meaningful always

)(),(var))((ˆ

2 XVarianceYXianceCo

XX

YYXX

XbY ˆˆ

Page 14: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

TEST OF SIGNIFICANCE FOR bTEST OF SIGNIFICANCE FOR b

Null hypothesis :

Test statistic t =

Where,

The value given under(1) follows a t-distribution with (n-2) df

0ˆ b

)1.......()ˆ(

bSEb

)ˆ(bSE

2

22

)()2(

)()(

XXn

XXbYY

Page 15: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

ASSUMPTIONSASSUMPTIONS

1. The relation between the two variables should be linear

2. The residuals should follow a Normal distribution with

zero mean and constant variance

Page 16: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

PRECAUTIONSPRECAUTIONS

1. Adequate sample size should be ensured

2. Prediction should be made within the range of the

observed values. No extrapolation should be attempted

3. The equation Y = a + bX should not be used

to predict X for a given Y

4. Model adequacy should be verified

Page 17: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

RESULTS OF REGRESSION ANALYSISRESULTS OF REGRESSION ANALYSIS--------------------------------------------------------------------------------------

Ind. variable Reg Coeff. SE t P-value

--------------------------------------------------------------------------------------

Age 1.08 0.08 14.16 < 0.0001

Constant 100.34

--------------------------------------------------------------------------------------

R2 = 93.99% 94%

Systolic BP = 100.34 + 1.08 AgeSystolic BP = 100.34 + 1.08 Age

95% CI for b = b ± 1.96 SE(b) = 1.08 ± 1.96 x 0.08

= (0.92, 1.24)

b̂ b̂

Page 18: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

INTERPRETATIONSINTERPRETATIONS

1. Change in age by one year results in a change of 1.08 mm Hg in Sys. BP

2. When age = 0, BP = 100.34, which is absurd

3. BP of a 50 year old individual is

100.24 + 1.08 x 50 = 154.34 100.24 + 1.08 x 50 = 154.34 154 mm Hg 154 mm Hg

4. 94% of the variation in BP is explained by age alone

08.1b̂

34.100a

%942R

Page 19: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

MULTIPLE LINEAR REGRESSIONMULTIPLE LINEAR REGRESSION

The response variable is expressed as a combination of

several predictor variables

0.147 & 1.024 are regression coefficients for ht. and wt.

Indicate the increase in for

an increase of 1 cm in ht. and 1 kg in wt., respectively

Eg. .024.1.147.035.47max wthtPE

maxPE

Page 20: Correlation and Regression SCATTER DIAGRAM The simplest method to assess relationship between two quantitative variables is to draw a scatter diagram

LOGISTIC REGRESSIONLOGISTIC REGRESSION

Response variable - Presence or absence of some condition

We predict a transformation of the response variable

instead of the actual value of the variable

Data : Hypertension, Smoking (X1) , Obesity(X2) & Snoring (X3)

Which of the factors are predictors of hypertension?

Logit (p) = -2.378 - 0.068 XLogit (p) = -2.378 - 0.068 X11 + 0.695 X + 0.695 X22 + 0.872 X + 0.872 X33

The probability can be estimated for any combination of the three variablesThe probability can be estimated for any combination of the three variables

Also, we can compare the predicated probability for different groups, Also, we can compare the predicated probability for different groups,

e.g., Smokers and Non-smokerse.g., Smokers and Non-smokers