regression and correlation analysis -...

13
[Type text] Chapter 4 Regression and correlation analysis Regression o The term regression analysis refers to the methods used to estimate the values of a variable from a knowledge of the values of another variable Correlation analysis o correlation analysis refers to the methods used to measure the strength of the association (correlation) among these variables. Our study here will concentrate on the relationship between two variables only. 4.1 Linear regression analysis: o Regression analysis is classified into many types according to the type of relationship between the two variables such as linear, exponential, logarithmic and power regression analysis. In the next paragraphs will concentrate on the linear regression analysis, which is our objective.

Upload: others

Post on 18-Jul-2020

27 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Regression and correlation analysis - Weeblyhishamezzat.weebly.com/uploads/9/0/6/0/9060375/51_-63.pdf · Correlation analysis o correlation analysis refers to the methods used to

[Type text]

Chapter 4

Regression and correlation

analysis

Regression

o The term regression analysis refers to the methods used to

estimate the values of a variable from a knowledge of the values

of another variable

Correlation analysis

o correlation analysis refers to the methods used to measure the

strength of the association (correlation) among these variables.

Our study here will concentrate on the relationship between two

variables only.

4.1 Linear regression analysis:

o Regression analysis is classified into many types according to the

type of relationship between the two variables such as linear,

exponential, logarithmic and power regression analysis. In the next

paragraphs will concentrate on the linear regression analysis, which

is our objective.

Page 2: Regression and correlation analysis - Weeblyhishamezzat.weebly.com/uploads/9/0/6/0/9060375/51_-63.pdf · Correlation analysis o correlation analysis refers to the methods used to

Regression and correlation analysis

Dr Hisham E Abdellatef Page 52

o The term linear means that an equation of a straight line is used

to describe the relationship between the two variables i. e. the

relationship between the two variables is represented by a straight

line, which is usually called the regression line.

Scatter diagram:

o In studying the relationship between two variables it is advisable

to plot the data on a graph as a first step. This allows visual

examination of the extent of association between the variables.

o The chart used for this purpose is known as a scatter diagram

which is a graph on which each plotted points represents an

observed pair of values of the dependent (Y) and independent (X)

variables.

Page 3: Regression and correlation analysis - Weeblyhishamezzat.weebly.com/uploads/9/0/6/0/9060375/51_-63.pdf · Correlation analysis o correlation analysis refers to the methods used to

Mathematics and statistics for pharmacists

Dr Hisham E Abdellatef Page 53

Regression Equation:

o The relationship between the two variables X and Y should be

represented by a straight line that is the regression line. The

regression line is the best or the ideal line or its equation which

represents a scatter diagram which may describe a possible

correlation between two variable.

o The equation of the linear regression line is calculated by the

following equation: Y = aX + b

o Where Y is the dependent variable, X is the independent variable,

a is the slope of the linear regression line and b is the intercept of

the regression line with the Y-axis.

o the values of a and b of the equation of straight line can be

calculated:

𝑎 = 𝑋 . 𝑌 − 𝑁 𝑋𝑌

( 𝑋)2− 𝑁 𝑋2

𝑏 = 𝑋 . 𝑋𝑌 − 𝑋2 𝑌

( 𝑋)2− 𝑁 𝑋2

The relationship between the toxic dose of a drug in mg/kg and

number of dead animals induced by its toxic effects was plotted

diagramatically using the X-axis for the dose of the drug and the Y-

axis for the number of dead animals. The following relationship was

obtained:

Page 4: Regression and correlation analysis - Weeblyhishamezzat.weebly.com/uploads/9/0/6/0/9060375/51_-63.pdf · Correlation analysis o correlation analysis refers to the methods used to

Regression and correlation analysis

Dr Hisham E Abdellatef Page 54

Calculate the equation of the linear regression line?

Solution:

X Y X2 XY

3 4 5 6 7 8

0 1 2 4 5 6

9 16 25 36 49 64

0 4 10 24 35 48

X = 33 Y = 18 X2 = 199 XY = 121

𝑎 = 𝑋 . 𝑌 − 𝑁 𝑋𝑌

( 𝑋)2− 𝑁 𝑋2

= 33𝑥18 − 6 𝑥 121

(33)2 − 6𝑥 199 = 1.886

𝑏 = 𝑋 . 𝑋𝑌 − 𝑋2 𝑌

( 𝑋)2− 𝑁 𝑋2

= 33 𝑥 121 − (199 𝑥 18)

(33)2 − (6𝑥 199)− 3.89

The equation is Y = 1.886 X – 3.89

Dose (mg/kg) (X) 3 4 5 6 7 8

Number of dead animals ( Y) 0 1 2 4 5 6

Page 5: Regression and correlation analysis - Weeblyhishamezzat.weebly.com/uploads/9/0/6/0/9060375/51_-63.pdf · Correlation analysis o correlation analysis refers to the methods used to

Mathematics and statistics for pharmacists

Dr Hisham E Abdellatef Page 55

4.2 The correlation coefficient:

o When two series of observations are made, it is often found that

the observations in one series (dependent series) vary

correspondingly with those in the other (independent series). The

correlation between the two series of observations can be tested

by determination of the correlation coefficient (r).

o The correlation coefficient is measured on a scale that varies from

+1 through zero to -1. 1 expresses complete correlation between

the variables. When one variable increases with the increase of

the other, the correlation is positive while when one variable

decreases with the increase of the other variable, the correlation

is negative. Complete absence of correlation is represented by r =

0.

Page 6: Regression and correlation analysis - Weeblyhishamezzat.weebly.com/uploads/9/0/6/0/9060375/51_-63.pdf · Correlation analysis o correlation analysis refers to the methods used to

Regression and correlation analysis

Dr Hisham E Abdellatef Page 56

The correlation coefficient equation:

𝐫 = [ 𝐗 − 𝐗 . 𝐘 − 𝐘 ]

𝐗 − 𝐗 𝟐. (𝐘 − 𝐘 )𝟐=

(𝐝𝐗.𝐝𝐲)

𝐝𝐱𝟐. 𝐝

𝐲

𝟐

The significance of the correlation between two variables can be

tested by comparing the value of t calculated according to the

following formula:

𝐭 = 𝐫 𝐧 − 𝟐

𝟏 − 𝐫𝟐

From the previous example mentioned in regression line, calculate

the correlation coefficient and test the significance of the

correlation? ( t = 2.57 at P<0.05 & D. F. = 5 ) .

Solution:

X dx dx2 Y dy dy2 dx . dy

3 4 5 6 7 8

-2.5 -1.5 -0.5 +0.5 +1.5 +2.5

6.25 2.25 0.25 0.25 2.25 6.25

0 1 2 4 5 6

-3 -2 -1 +1 +2 +3

9 4 1 1 4 9

+7.5 +3.0 +0.5 +0.5 +3.0 +7.5

𝐫 = (𝐝𝐗.𝐝𝐲)

𝐝𝐱𝟐. 𝐝𝐲

𝟐=

𝟐𝟐

𝟏𝟕.𝟓 𝐱 𝟐𝟓= 𝟎. 𝟗𝟗𝟒

Page 7: Regression and correlation analysis - Weeblyhishamezzat.weebly.com/uploads/9/0/6/0/9060375/51_-63.pdf · Correlation analysis o correlation analysis refers to the methods used to

Mathematics and statistics for pharmacists

Dr Hisham E Abdellatef Page 57

𝐭 = 𝐫 𝐧 − 𝟐

𝟏 − 𝐫𝟐= 𝟎. 𝟗𝟗𝟒

𝟔 − 𝟐

𝟏− (𝟎.𝟗𝟗𝟒)𝟐= 𝟏𝟖. 𝟏𝟓

The value of the calculated t is higher than that of the

tabulated t, therefore there is a positive significant correlation

between the dose of the drug and its toxic effect

Another solution using Instat® software:

Page 8: Regression and correlation analysis - Weeblyhishamezzat.weebly.com/uploads/9/0/6/0/9060375/51_-63.pdf · Correlation analysis o correlation analysis refers to the methods used to

Regression and correlation analysis

Dr Hisham E Abdellatef Page 58

Page 9: Regression and correlation analysis - Weeblyhishamezzat.weebly.com/uploads/9/0/6/0/9060375/51_-63.pdf · Correlation analysis o correlation analysis refers to the methods used to

Mathematics and statistics for pharmacists

Dr Hisham E Abdellatef Page 59

Zagazig University Faculty of Pharmacy Clinical Pharmacy Program

31 November 2011 Time allowed: 30 min Total marks: 5 Marks

Mathematics and Statistics (MS 101) Periodic Exam._

Answer the following:_ A new drug was tested for its antidiabetic effect by giving it to diabetic patients for one week. The fasting blood glucose level was determined before and after drug administration. The following are the results of blood glucose levels (mg/dl): Before treatment: 200, 250, 300 and 250. After treatment: 100, 150, 150 and 200.

A) What is the difference between the two treatments and what is its significance? (t=2.45 at D.F. 6 and p<O.05, t=3.18 at D.F. 3 and p<0.05) (3.5 Marks)

B) If you are using Instat@ software, what will be your choice paired or

unpaired, equal or different SDs, one or two-tailed test? (1.5 Marks

Page 10: Regression and correlation analysis - Weeblyhishamezzat.weebly.com/uploads/9/0/6/0/9060375/51_-63.pdf · Correlation analysis o correlation analysis refers to the methods used to

Regression and correlation analysis

Dr Hisham E Abdellatef Page 60

Zagazig University Faculty of Pharmacy Clinical Pharmacy Program Level I

Time allowed: 2 hours Total marks`: 90 Marks 3 January 2012

Mathematics and Statistics (MS 101) Final Exam

ALL QUESTIONS ARE TO BE ATTEMPTED:

Question I: Answer the following questions (30 marks):

1. Find the straight line (y - a + bx) which agreements with these data

x: 0 2 4 6 8 10

y : 4 5 ll 16 22 28

2. Find the coef. of the limit has x 6 in (3𝑥

3+

6𝑦

𝑥2)18

3. Find the solutions for these equations.

3x + 5y + z = - 4

2 x + 4 y +5 z =-9

x + 2 y +2 z = -3

Question II: Write short notes on each of the following (10 marks):

A) Type II errors

B) Quantitative Variables

C) Histogram

D) Mode

E) Pie sector chart.

Question III: (25 marks) A new substance was tested for its effect on blood glucose level. It was given to a group of diabetic animals for one week (treated group) and their effect on blood glucose level was compared with a group of untreated animals (control group). The following are the results of fasting blood glucose levels (mg/dl): Treated 80, 90, 100, 90, 80, 100 Control; 130, 140, 150, 140, 130, I50

A)Calculate the coefficient of variation for each group (5 Marks) B) Does the new substance have a significant antidiabetic effect? (t 2.57 at

Page 11: Regression and correlation analysis - Weeblyhishamezzat.weebly.com/uploads/9/0/6/0/9060375/51_-63.pdf · Correlation analysis o correlation analysis refers to the methods used to

Mathematics and statistics for pharmacists

Dr Hisham E Abdellatef Page 61

D.F. 5 and p<O.05, t2.23 at D.F. 10 and p<O.05) (15 Marks) C) If you are using Instat@ software, what will be your choice paired or

unpaired, equal or different SDs, one or two-tailed test? (5 Marks) Question IV: (20 marls) Three pharmaceutical preparations were used for weight reduction in obese individuals. The decreases in body weight in grams in individuals receiving the preparations were as follows:

Preparation. A: 10 , 8 , 9 , 8 , 10 and I I Preparation B: 7 , 8 , 6 , 5 , 6 and 7 Preparation C: 4 6 , 5 , 4 , 5 and 4

A) Calculate the Standard deviation for each group. B) Is there any Significant difference between these three (15 marks)

Questions V (5 marks) Instat@ software was used for linear regression analysis of the dose of an antihypertensive drug (X) and the decrease in blood pressure (Y). How can you write up the equation of the regression line from the following results? Linear Regression

Number of points = 6 Best – fit Standard 95% confidence interval Parameter value Error From to

Slop 0.2545 0.02199 0.1938 0.3159 Y- Intercept - 1.590 1.350 - 5.337 2.156 X – Intercept 6.241 Correlation coefficient (r) = 0.9854 r squared = 0.9721 Standard deviation of residuals from line (Sy.x) =1.636 Test: Is the slope significantly different from zero? The P value is 0.0003, considered extremely significant.

Page 12: Regression and correlation analysis - Weeblyhishamezzat.weebly.com/uploads/9/0/6/0/9060375/51_-63.pdf · Correlation analysis o correlation analysis refers to the methods used to

Regression and correlation analysis

Dr Hisham E Abdellatef Page 62

Zagazig University Faculty of Pharmacy Clinical Pharmacy Program Level 1 Final Exam.

27 august 2012 (Summer) Time allowed: 2 hours Total marks : 90 marks Mathematics and statistics (MS101)

Part A: Answer the following questions: 30 marks

1) Using Gauss method finds the solution for these equations:- (12 marks) 3 x + 5y + z + 4 = 0 2 x + 4y + 5z - 9 = 0 x + 2y + 2z - 3 = 0

2) (18 marks) a. Find the suitable linear equation (y = a + bx) which agreement with this

data a. x : 0 1 2 3 4 5 b. y : 2 4 5 8 11 12

b. Find the limit free from x for 72

( 4 )xx

Part B: All questions are to be attempted (60 marks)

Question I write on the following (10 marks) 1- Measure the central tendency (mean , median, mode, midrange) 2- Errors 3- Chi-square (X)2 test 4- Graphic presentation of data

Question II (10 marks) Calculate the average deviation of the following distribution: Question III: (20 marks) In a trial to adapt a new combination therapy for treatment of hyperlipidemia accompanied with diabetes,

xi fi

[10, 15) 12.5 3

[15, 20) 17.5 5

[20, 25) 22.5 7

[25, 30) 27.5 4

[30, 35) 32.5 2

Control 53 51 48 44 Complamin alone 37 39 41 37 Complamin + glibenclamide 45 46 47 44

Page 13: Regression and correlation analysis - Weeblyhishamezzat.weebly.com/uploads/9/0/6/0/9060375/51_-63.pdf · Correlation analysis o correlation analysis refers to the methods used to

Mathematics and statistics for pharmacists

Dr Hisham E Abdellatef Page 63

complamin and its combination with glibenclamide were given orally to hyperlipidemic diabetic rats. The serum levels of VLDL (mg/dl) were determined and the results are presented in the following table:

A) Calculate the standard deviation for each group

B) Does the average serum levels of VLDL differ significantly from group to group?

Critical values of F for the 0.05 significance level

Question IV (20 marks)

The relationship between the toxic dose of a drug in mg/kg and number of dead animals induced by its toxic effects was plotted diagramatically using the X-axis for the dose of the drug and the Y-axis for the number of dead animals. The following relationship was obtained:

Dose (mg/kg) (X) 3 4 5 6 7 8 Number of dead animals ( Y) 0 1 2 4 5 6

A) Calculate the equation of the linear regression line? B) Calculate the correlation coefficient and test the significance of the

correlation ? ( t = 2.57 at P<0.05 & D. F. = 5 ) .