regression and correlation analysis -...
TRANSCRIPT
[Type text]
Chapter 4
Regression and correlation
analysis
Regression
o The term regression analysis refers to the methods used to
estimate the values of a variable from a knowledge of the values
of another variable
Correlation analysis
o correlation analysis refers to the methods used to measure the
strength of the association (correlation) among these variables.
Our study here will concentrate on the relationship between two
variables only.
4.1 Linear regression analysis:
o Regression analysis is classified into many types according to the
type of relationship between the two variables such as linear,
exponential, logarithmic and power regression analysis. In the next
paragraphs will concentrate on the linear regression analysis, which
is our objective.
Regression and correlation analysis
Dr Hisham E Abdellatef Page 52
o The term linear means that an equation of a straight line is used
to describe the relationship between the two variables i. e. the
relationship between the two variables is represented by a straight
line, which is usually called the regression line.
Scatter diagram:
o In studying the relationship between two variables it is advisable
to plot the data on a graph as a first step. This allows visual
examination of the extent of association between the variables.
o The chart used for this purpose is known as a scatter diagram
which is a graph on which each plotted points represents an
observed pair of values of the dependent (Y) and independent (X)
variables.
Mathematics and statistics for pharmacists
Dr Hisham E Abdellatef Page 53
Regression Equation:
o The relationship between the two variables X and Y should be
represented by a straight line that is the regression line. The
regression line is the best or the ideal line or its equation which
represents a scatter diagram which may describe a possible
correlation between two variable.
o The equation of the linear regression line is calculated by the
following equation: Y = aX + b
o Where Y is the dependent variable, X is the independent variable,
a is the slope of the linear regression line and b is the intercept of
the regression line with the Y-axis.
o the values of a and b of the equation of straight line can be
calculated:
𝑎 = 𝑋 . 𝑌 − 𝑁 𝑋𝑌
( 𝑋)2− 𝑁 𝑋2
𝑏 = 𝑋 . 𝑋𝑌 − 𝑋2 𝑌
( 𝑋)2− 𝑁 𝑋2
The relationship between the toxic dose of a drug in mg/kg and
number of dead animals induced by its toxic effects was plotted
diagramatically using the X-axis for the dose of the drug and the Y-
axis for the number of dead animals. The following relationship was
obtained:
Regression and correlation analysis
Dr Hisham E Abdellatef Page 54
Calculate the equation of the linear regression line?
Solution:
X Y X2 XY
3 4 5 6 7 8
0 1 2 4 5 6
9 16 25 36 49 64
0 4 10 24 35 48
X = 33 Y = 18 X2 = 199 XY = 121
𝑎 = 𝑋 . 𝑌 − 𝑁 𝑋𝑌
( 𝑋)2− 𝑁 𝑋2
= 33𝑥18 − 6 𝑥 121
(33)2 − 6𝑥 199 = 1.886
𝑏 = 𝑋 . 𝑋𝑌 − 𝑋2 𝑌
( 𝑋)2− 𝑁 𝑋2
= 33 𝑥 121 − (199 𝑥 18)
(33)2 − (6𝑥 199)− 3.89
The equation is Y = 1.886 X – 3.89
Dose (mg/kg) (X) 3 4 5 6 7 8
Number of dead animals ( Y) 0 1 2 4 5 6
Mathematics and statistics for pharmacists
Dr Hisham E Abdellatef Page 55
4.2 The correlation coefficient:
o When two series of observations are made, it is often found that
the observations in one series (dependent series) vary
correspondingly with those in the other (independent series). The
correlation between the two series of observations can be tested
by determination of the correlation coefficient (r).
o The correlation coefficient is measured on a scale that varies from
+1 through zero to -1. 1 expresses complete correlation between
the variables. When one variable increases with the increase of
the other, the correlation is positive while when one variable
decreases with the increase of the other variable, the correlation
is negative. Complete absence of correlation is represented by r =
0.
Regression and correlation analysis
Dr Hisham E Abdellatef Page 56
The correlation coefficient equation:
𝐫 = [ 𝐗 − 𝐗 . 𝐘 − 𝐘 ]
𝐗 − 𝐗 𝟐. (𝐘 − 𝐘 )𝟐=
(𝐝𝐗.𝐝𝐲)
𝐝𝐱𝟐. 𝐝
𝐲
𝟐
The significance of the correlation between two variables can be
tested by comparing the value of t calculated according to the
following formula:
𝐭 = 𝐫 𝐧 − 𝟐
𝟏 − 𝐫𝟐
From the previous example mentioned in regression line, calculate
the correlation coefficient and test the significance of the
correlation? ( t = 2.57 at P<0.05 & D. F. = 5 ) .
Solution:
X dx dx2 Y dy dy2 dx . dy
3 4 5 6 7 8
-2.5 -1.5 -0.5 +0.5 +1.5 +2.5
6.25 2.25 0.25 0.25 2.25 6.25
0 1 2 4 5 6
-3 -2 -1 +1 +2 +3
9 4 1 1 4 9
+7.5 +3.0 +0.5 +0.5 +3.0 +7.5
𝐫 = (𝐝𝐗.𝐝𝐲)
𝐝𝐱𝟐. 𝐝𝐲
𝟐=
𝟐𝟐
𝟏𝟕.𝟓 𝐱 𝟐𝟓= 𝟎. 𝟗𝟗𝟒
Mathematics and statistics for pharmacists
Dr Hisham E Abdellatef Page 57
𝐭 = 𝐫 𝐧 − 𝟐
𝟏 − 𝐫𝟐= 𝟎. 𝟗𝟗𝟒
𝟔 − 𝟐
𝟏− (𝟎.𝟗𝟗𝟒)𝟐= 𝟏𝟖. 𝟏𝟓
The value of the calculated t is higher than that of the
tabulated t, therefore there is a positive significant correlation
between the dose of the drug and its toxic effect
Another solution using Instat® software:
Regression and correlation analysis
Dr Hisham E Abdellatef Page 58
Mathematics and statistics for pharmacists
Dr Hisham E Abdellatef Page 59
Zagazig University Faculty of Pharmacy Clinical Pharmacy Program
31 November 2011 Time allowed: 30 min Total marks: 5 Marks
Mathematics and Statistics (MS 101) Periodic Exam._
Answer the following:_ A new drug was tested for its antidiabetic effect by giving it to diabetic patients for one week. The fasting blood glucose level was determined before and after drug administration. The following are the results of blood glucose levels (mg/dl): Before treatment: 200, 250, 300 and 250. After treatment: 100, 150, 150 and 200.
A) What is the difference between the two treatments and what is its significance? (t=2.45 at D.F. 6 and p<O.05, t=3.18 at D.F. 3 and p<0.05) (3.5 Marks)
B) If you are using Instat@ software, what will be your choice paired or
unpaired, equal or different SDs, one or two-tailed test? (1.5 Marks
Regression and correlation analysis
Dr Hisham E Abdellatef Page 60
Zagazig University Faculty of Pharmacy Clinical Pharmacy Program Level I
Time allowed: 2 hours Total marks`: 90 Marks 3 January 2012
Mathematics and Statistics (MS 101) Final Exam
ALL QUESTIONS ARE TO BE ATTEMPTED:
Question I: Answer the following questions (30 marks):
1. Find the straight line (y - a + bx) which agreements with these data
x: 0 2 4 6 8 10
y : 4 5 ll 16 22 28
2. Find the coef. of the limit has x 6 in (3𝑥
3+
6𝑦
𝑥2)18
3. Find the solutions for these equations.
3x + 5y + z = - 4
2 x + 4 y +5 z =-9
x + 2 y +2 z = -3
Question II: Write short notes on each of the following (10 marks):
A) Type II errors
B) Quantitative Variables
C) Histogram
D) Mode
E) Pie sector chart.
Question III: (25 marks) A new substance was tested for its effect on blood glucose level. It was given to a group of diabetic animals for one week (treated group) and their effect on blood glucose level was compared with a group of untreated animals (control group). The following are the results of fasting blood glucose levels (mg/dl): Treated 80, 90, 100, 90, 80, 100 Control; 130, 140, 150, 140, 130, I50
A)Calculate the coefficient of variation for each group (5 Marks) B) Does the new substance have a significant antidiabetic effect? (t 2.57 at
Mathematics and statistics for pharmacists
Dr Hisham E Abdellatef Page 61
D.F. 5 and p<O.05, t2.23 at D.F. 10 and p<O.05) (15 Marks) C) If you are using Instat@ software, what will be your choice paired or
unpaired, equal or different SDs, one or two-tailed test? (5 Marks) Question IV: (20 marls) Three pharmaceutical preparations were used for weight reduction in obese individuals. The decreases in body weight in grams in individuals receiving the preparations were as follows:
Preparation. A: 10 , 8 , 9 , 8 , 10 and I I Preparation B: 7 , 8 , 6 , 5 , 6 and 7 Preparation C: 4 6 , 5 , 4 , 5 and 4
A) Calculate the Standard deviation for each group. B) Is there any Significant difference between these three (15 marks)
Questions V (5 marks) Instat@ software was used for linear regression analysis of the dose of an antihypertensive drug (X) and the decrease in blood pressure (Y). How can you write up the equation of the regression line from the following results? Linear Regression
Number of points = 6 Best – fit Standard 95% confidence interval Parameter value Error From to
Slop 0.2545 0.02199 0.1938 0.3159 Y- Intercept - 1.590 1.350 - 5.337 2.156 X – Intercept 6.241 Correlation coefficient (r) = 0.9854 r squared = 0.9721 Standard deviation of residuals from line (Sy.x) =1.636 Test: Is the slope significantly different from zero? The P value is 0.0003, considered extremely significant.
Regression and correlation analysis
Dr Hisham E Abdellatef Page 62
Zagazig University Faculty of Pharmacy Clinical Pharmacy Program Level 1 Final Exam.
27 august 2012 (Summer) Time allowed: 2 hours Total marks : 90 marks Mathematics and statistics (MS101)
Part A: Answer the following questions: 30 marks
1) Using Gauss method finds the solution for these equations:- (12 marks) 3 x + 5y + z + 4 = 0 2 x + 4y + 5z - 9 = 0 x + 2y + 2z - 3 = 0
2) (18 marks) a. Find the suitable linear equation (y = a + bx) which agreement with this
data a. x : 0 1 2 3 4 5 b. y : 2 4 5 8 11 12
b. Find the limit free from x for 72
( 4 )xx
Part B: All questions are to be attempted (60 marks)
Question I write on the following (10 marks) 1- Measure the central tendency (mean , median, mode, midrange) 2- Errors 3- Chi-square (X)2 test 4- Graphic presentation of data
Question II (10 marks) Calculate the average deviation of the following distribution: Question III: (20 marks) In a trial to adapt a new combination therapy for treatment of hyperlipidemia accompanied with diabetes,
xi fi
[10, 15) 12.5 3
[15, 20) 17.5 5
[20, 25) 22.5 7
[25, 30) 27.5 4
[30, 35) 32.5 2
Control 53 51 48 44 Complamin alone 37 39 41 37 Complamin + glibenclamide 45 46 47 44
Mathematics and statistics for pharmacists
Dr Hisham E Abdellatef Page 63
complamin and its combination with glibenclamide were given orally to hyperlipidemic diabetic rats. The serum levels of VLDL (mg/dl) were determined and the results are presented in the following table:
A) Calculate the standard deviation for each group
B) Does the average serum levels of VLDL differ significantly from group to group?
Critical values of F for the 0.05 significance level
Question IV (20 marks)
The relationship between the toxic dose of a drug in mg/kg and number of dead animals induced by its toxic effects was plotted diagramatically using the X-axis for the dose of the drug and the Y-axis for the number of dead animals. The following relationship was obtained:
Dose (mg/kg) (X) 3 4 5 6 7 8 Number of dead animals ( Y) 0 1 2 4 5 6
A) Calculate the equation of the linear regression line? B) Calculate the correlation coefficient and test the significance of the
correlation ? ( t = 2.57 at P<0.05 & D. F. = 5 ) .