module 8
TRANSCRIPT
Unit-8REGRESSION ANALYSIS
INTRODUCTION So far we have studied correlation analysis, which measures the direction and strength of the relationship
between two variables. After establishing the correlation existing between the two variables one may be interested in estimating the value of one variable with the help of value of another variable. The statistical method with the help of which we are in a possible to estimate or predict the unknown value of one variable from the known value of another variables is called Regression.
The Regression succeeds the correlation once the correlation ship between the two variations is established, the regression analysis proceeds with the estimation of probable values.
Sir. Francis Galton, a British biometrician, introduced the concept regression for the first time in 1877: while studying the correlation between the heights of sons and their fathers. He concluded in his studies, “Tall fathers tend to have tall sons and short fathers short sons. The average height of the sons of a group of tall fathers is less than that of the fathers. While the average height of the sons of a group of short fathers is greater than that of the fathers.
It means the coming generations of tall or short parents tend to step back to average height of population. Now a days a modern statistician prefer to use the term Regression in the sense of estimation, which is an important statistical tool in a economics business.
Meaning Regression means returning or stepping back to the average value. In statistics, the term
Regression means simple the average relationship. We can predict or estimate the value of dependent variable from the given related values of independent variable with the help of a Regression Technique.
The measure of Regression studies the nature of correlation ship to estimate the most probable values. It establishes a functional relationship between the independent and dependent variables.
Definition According to Blair “Regression is the measure of the average relationship between two or more variable
in terms of the original units of the data”According to Taro Yamame “ One of the most frequently used technique in economics and business
research to find a relation between two or more variables that are related casually, is regression analysis.According to Wallis and Robert “It is often more important to find out what the relation actually is, in
order to estimate or predict one variable and statistical technique appropriate in such a case is called regression analysis.
USES OF REGRESSION ANALYSIS Regression analysis is of great practical use even more than the correlation analysis; the following are
some uses,1. Regression analysis helps in establishing a functional relationship between two or more
variables once this is established, it can be used for various advanced analytic purpose.2. With the use of electronic machine and computers tedium of collection of regression equation
particularly expressing multiple and a non-linear relationship has been reduced a great deal.3. Since most of the problems of economic analysis are based on cause and effect relationship.
The regression analysis is a highly valuable tool in economic and business research.4. The regression analysis is very useful for prediction purpose. Once a functional relationship is
known, the value of dependent variable can be predicted from the given value of the independent variable.
CORRELATION AND REGRESSION
These two techniques are directed towards a common purpose of establishing the degree and the direction of relationship between two or more variables but the methods of doing so are different. The choice of one or the other will depend on the purpose. In spite certain similarities between these two, but there are some basic differences in the two approaches, which have been summarized below:
138
CORRELATION REGRESSION1. Correlation, literally means related or
sympathetic movements between variables2. There is a sort of interdependence, which is
mutual.3. There is no cause and effect relation ship. It
only shows the existence of some association in the movement of variables.
4. It may be spurious correlation if the sympathetic movement is on account of the influence of an out side variable which has no relevance.
5. It is a relative measure showing association between variables.
6. It is used only for testing and verification of the relationship. It tenders only a limited information.
7. It is not very useful for further mathematical treatment.
1. Regression literally means return to the normal, which is true on account of the average of relationship.
2. It establishes a functional relationship, which is mathematical showing dependence of one variable on the other.
3. It may have a cause and effect relationship.4. It is a mathematical relationship, which
should be interpreted suitably.5. It is an absolute measure of relationship.6. Besides verification it can also be used for
estimation and prediction. It tenders more comprehensive information.
7. It is very useful for further mathematical treatment.
METHODS OF REGRESSION ANALYSISThere are two methods:
1. Graphic methods (Not included in the syllabus)2. Algebraic method.
The algebraic methods for simple linear regression can be broadly divided in to the following,A. Regression lines.B. Regression Equations.C. Regression coefficient.
A. REGRESSION LINES:In the graphical jargon, a regression line is a straight line fitted to the data by the method of least squares.
It indicates the best probable mean value of one variable corresponding to the mean value of the other. Since a regression line is the line of best fit, it cannot be used conversely therefore, there are always two regression lines constructed for the relation ship between tow variables x and y. Thus one regression line shows regression of x upon y and the other shows regression of y upon x.
When two variables have relationship, then we can draw a regression line. The regression line of x on y gives the most probable vales of x for any given value of y. In the same manner the regression line of y on x gives the most probable values of y for any given value of x. Thus there will be two regression lines in the case of two variables.
REGRESSION EQUATIONSRegression equation is an algebraic method. It is an algebraic expression of the regression line. It can be
classified in to regression equation, regression coefficients. As there are two regression lines, there are two regression equations. For the two variables x and y, there
are two regression equations. They are regression equation of x on y and the regression equation of y on x.I Regression equation of x on y
II Regression Equation of Y on X
139
Y(Y-Y)=r (X-X)
(X-X)=r (Y-Y) Y
Application of Regression Equations when all required values are given
ILLUSTRATION =01From the following results, obtain the two-regression equation and estimate the yield of crops when the
rainfall is 29 cms and the rainfall when the yield is 600 kg.Y
YieldIn Kg
XRainfallIn cm
MeanS.D
508.436.8
26.74.6
Co efficient of correlation between yield and rainfall=0.52Solution: To estimate the yield of crops, we have to use Y on X Regression Equation.
Y-508.4 = 4.16 (x-26.7) Y-508.4 =4.16x-111.072 Y = 4.16x-111.072+508.4
Y=4.16x +397.328 R.line When x =29=4.16 x 29 + 397.328= 120.64 + 397.328= 517.968 kgs
Similarly to estimate rainfall, we have to used x on y Regression equation.
When Y=600 KgsX=0.065X600-6.346=39-6.346X=32.654
140
Y(X-X)=r (Y-Y)
36.8Y-508.4=0.52 (X-26.7) 4.6
4.6X-26.7=0.52 (Y-508.4) 36.8
X-26.7=0.065 (Y-508.4)
Y(Y-Y)= r (X-X)
X-26.7=0.065Y-33.046
X=0.065Y-33.046+26.7
X=0.065Y-6.346 R, Line
ILLUSTRATION =02Find out the regression equation, showing the regression of capacity utilization on production from the
following data.Production In lakh
UnitsAverage
35.6Standard Deviation
10.5Capacity Utilization
(in percentage)84.8 8.5
Coefficient of correlation}=0.62Estimate the production when the capacity utilization is 70%SOLUTION; Let the production and capacity utilization be denoted by X and Y respectively. Then we are given;
To estimate production we have to use X on Y regression equation
ILLUSTRATION = 03
Karl Pearson’ coefficient of correlation between the ages of brother’s and sisters in a community was found to be 0.8. Average of the brother’s ages was 25 years and that of sister’s were 22years.Their standard deviations were 4 and 5 respectively.Find a. The expected age of brother when the sister’s age is 12 years.
b. The expected age of sister when the brother’s age is 33 years.Solution:
Brother Sister X Y
Mean age 25 years 22years Standard
Deviation 4 5
Co-efficient of Correlation 0.8To estimate the brother’s age, we have to use X on Y Regression equation. X=? When Y =12
141
X=35.6 Y=84.8 X=10.5 Y=8.5 P=0.62
10.5(X-35.6)=0.62 (Y-84.8) 8.5
X=35.6=0.7658(Y-84.8)X-35.6=0.7658Y-64.94X=0.7658y—64.94+35.6 X=0.7658y-29.34 R.LineWhen Y=70%=0.7658X70-29.34=53.606-29.34X=24.266 lakh unit
(X-X)= r (Y-Y) Y
(X-X)=r (Y-Y) Y
To estimate the sister’s age, we have to use Y on X regression equation Y=? When X=33years
ILLUSTARION=04Give the following data, estimate
1. The value of Y when X=702. The value of X when Y=90
X-Series Y-Series Mean 18 100
Standard deviation 14 20Co-efficient of correlation 0.8SOLUTION
142
4X-25=0.8 (Y-22)
5X-25=0.64(Y-22)X-25=0.64Y-14.08X=0.64Y-14.08+25X=0.64Y+10.92 R.Line When Y=12=0.64X12+10.92X=18.6 years, brother’s age
Y(Y-Y)=r (X-X)
5(Y-22)=0.8 (X-25)
4Y-22=1.0 (X-25)Y-22=1X-25Y=X-22+22
Y=X-3 R.LineWhen X=33Y=33-3Y=30 years, sister’s age
I .Y=? When X =70 use Y on X R. equation
Y(Y-Y)= r (X-X)
X 20
Y-100=0.8 (X-18) 14
Y-100=1.143 (X-18)Y-100=1.143X-20.574Y=1.143X-20.574+100Y=1.143X+79.426 R.LineWhen X=70Y=1.143 X 79 + 79.426Y=80.01+79.426Y=159.436
II. X=? When Y=90 Use X on Y R. Equation
14X-18=0.8 (Y-100)
20X-18=0.56 (Y-100)X-18=0.56Y-56X=0.56Y-56+18X=0.56Y-38 R.LineWhen Y=90X=0.56 X 90-38=50.4-38X=12.4
(X-X)= r (Y-Y) Y
ILLUSTRATION=05To study the relationship between expenditure on a accommodation (X) and expenditure on Food (Y), an
enquiry in to 50 families gave the following result;
Estimate the expenditure on food when expenditure on accommodation is Rs200.
SOLUTIONTo estimate expenditure on food, we should use Y on X Regression Equation.
20 (Y-192)=0.6 (X-170) when X=200 60 Y=0.1999 X 200 + 158Y-192=0.1999(X-170) =39.98+158Y-192=0.1999X-33.9999 Y=Rs.197.98Y=0.1999X+158 R.L Rs.197.98 is required to be spent on food.
ILLUSTRATION=06
Obtain the two Regression Equations from the following;
X-Series Y-SeriesMean 20 25Variance 4 9Coefficient of correlation =0.75
SOLUTIONObtaining of two Regression lines
ILLUSTRATION=07
143
∑X=8500, ∑Y=9600, X=60, Y=20, r=0.60
Y(Y-Y)=r (X-X)
∑X 8500 ∑y 9600X= = =170, Y= =192 n 50 n 50
X= Variance == 2 bxy=Regression coefficient on x on yb=Regression coefficient
Xbxy= r
Y (X-X)=bxy (Y-Y)
2X-20=0.75 (Y-25)
3X-20=0.5 (Y-25)X-20=0.5-12.5X=0.5-12.5+20X=0.5+7.5 R.Line
X on Y R. Equation
Y= Variance = = 3 bxy=Regression coefficient on Y on Xb=Regression coefficient
Ybxy= r
X (Y-Y)=bxy (X-X)
3Y-25=0.75 (X-20)
2Y-25=1.125 (X-20)Y-25=1.125-22.5Y=1.125X-22.5+25Y=1.125+2.5 R.Line
Y on X R. Equation
ILLUSTRATION = 07You are given the following data.
X-Sries Y-SeriesMean 47 96Variance 64 81
Coefficient of Correlation =0.36
Calculate Y when X is 50, and X when Y is 88.
SOLUTION
ILLUSTRATION=08The following results for heights and weights of 100 men were calculated.
Mean Standard DeviationCoefficient ofCorrelation
Weights 150 lbs 20 lbs0.60
Heights 68 ” 2.5 “Find an estimate
1. The weight of a man whose height is 5’ (5’=60”)2. Height of a man whose is 200 lbs
SOLUTIONLet X= Weight and Y = Height.
144
X= Variance = 64 = 8 bxy= r YX-X =bxy (Y-Y)
8X-47=0.36 (Y-96)
9X-45=0.3199 (Y-96)X-47=0.3199Y-30.7199X=0.3199Y-30.7199+47X=0.3199Y+16.28 R.LineWhen Y=88X=0.3199 X 88 + 16.28X=28.1512 + 16.28X= 44.4312
X on Y R.Equation
Y= Variance = 81 = 9 Ybxy= r (Y-Y) =bxy (X-X)
9Y-96=0.36 (X-47)
8Y-96=0.405 (X-47)Y-96=0.405X-19.035Y=0.405X-19.035+96Y=0.405X+76.965 R.LineWhen X=50Y=0.405 X 50 + 76.965=20.25 + 76.965Y= 97.215
Y on X R.Equation
REGRESSION COEFFICIENTSRegression coefficient is denoted by ‘b’. There are two regression equations and therefore
there are two regression coefficients also. Regression coefficients measure the changes in the series corresponding to a unit change in the other series.
The Regression coefficient of X on Y
Give us the value by which X-variable changes for a unit change in the value of Y-variable.
Similarly the regression on of Y on X
Refers to the value by which Y-variable changes for a unit change in X-variable
These two coefficients measure the change in dependent variable corresponding to the unit change in independent variable. They also help in direct calculation of coefficient of correlation.
Square – root of the product of two Regression coefficient gives us the value of correlation, as under;
Bxy X byx =2
145
(X-X)=bxy (Y-Y) 20(X-150)= X 0.6 (Y-68) 2.5X-150=4.8 (Y-68)X-150= 4.8Y-326.4X= 4.8Y-326.4+150X=4.8Y-176.4 RLwhen Y=60 5X=4.8 X 600-176.4X=111.6”OR X =9’-3.6”
X on Y R Equation(Y-Y)=byx (X-X) 20(Y-68)= (X-150) 2.5Y-68=0.075 (X-150)Y-68= 0.075X-11.25Y= 0.075X-11.25+68Y=0.075X+176.4 RLwhen X=200 lbsY=0.075 X 200 + 56.75 Y =71.75 lbs
X on Y R Equation
X i.e bxy =r
Y
∑dxdy X n – (∑dx X ∑dy)bxy =
∑d 2Yxn - (∑dy)2
Y i.e. byx =r
X
∑dxdy X n – (∑dx X ∑dy)byx =
∑d 2xX n-(∑dx)2
r = bxy X byx
X YBxy X box = X r
Y X
CALCULATION OF REGRESSION COEFFICENTS AND MAKING ESTIMATION OF UN-KNOWN VALUE
INDIVIDUAL SERIES =
ILUSTRATION =09From the data given below find out;
a. Regression coefficientsb. Regression Equationsc. Estimate the age when B.P is 130d. Estimate the B.P when age is 50 yearse. Find the coefficient of correlation through Regression coefficients.
Age 56 42 72 36 63 47 55 49 38 42 68 60B.P 147 125 160 118 149 128 150 145 115 140 152 155
SOLUTIONAgeX
X-47dx
D2xB.PY
Y-128dy
D2Ydxdy
564272366347554938426860
9-525-1116082-9-52113
812562512125606448125441169
147125160118149128150145115140152155
19-332-102102217-13122427
3619
10241004410
484289169144576729
171158001103360
17634117-60504351
N=12
64∑dx
1892∑d2x
N=12
148∑dy
4326∑d2y
2554∑dxdy
146
When actual data is given and deviation are taken from assumed mean
∑dxX=A+ X C
N64
=47+ X 112
X=52.33∑dy
Y=A + X C n148
=128+ X 1 12
=128+12.33Y= 140.33
Xbxy= Y
∑dxdy X n – (∑dx X ∑dy)byx = ∑d 2Y X n - (∑dY)2
= 2554 X 12 – 64X1484326X12 – (148) 2
= 30648 – 947251912 – 21904
= 21176 =0.705730008 0.7057
X on Y =R. Equation(x- )=bxy (Y-Y)(X-52.33)=0.7057 (Y-140.33)
Regression coefficient X onY
Ybxy= x
∑dxdy X n – (∑dx X ∑dy)byx = ∑d 2x X n - (∑dX)2
= 2554 X 12 – 64X1481892 X12 – (64) 2
= 2117622704 – 4096
= 21176 18608 =1.138
X on Y =R. Equation(Y-Y)=byx (x- )Y-140.33=1.138 (X-52.33)
Regression coefficient X on Y
Coefficient of correlation =√bxy X bys = √0.7057 X 1.138 =0.896
ILLUSTRATION=10From the following data, obtain the two Regression Equations. Also calculate coefficient of
correlation based on regression coefficient.Sales: X 91 97 108 121 67 124 51 73 111 57Purchases: Y 71 75 69 97 70 91 39 61 80 47
SOLUTION
XX-67
dxdx2 Y
Y-70dy
Dx2 dxdy
919710812167124517311157
24304154057-16644-10
57690016812416
0324925636
1936100
71756997709139618047
15-127021-31-910-23
1251
7290
44196181100529
24150-41
14580
1197496-54440230
230∑dx
11150∑d2x
0∑dy
2868∑d2x
3900∑dxdy
147
X-52.33=0.7057Y-99.031X=0.7057Y-99.031+52.33X=0.7057Y-46.701Estimation of age (X) whenB.P(Y) is 130X=0.7057 X 130-46.701=91.741-46.701X=45.04 years
Y-140.33=1.138X-59.55Y=1.138X-59.55+140.33Y=1.138X-80.78Estimation of B.P (Y) whenAge(X) is 50 yearsY=1.138 X 50-80.78=56.9-80.78Y=137.68
X=A +∑dx X C W
=67+230 X 1 10=90
Y= A + ∑dy X C N
=70 + 0 X 1 10Y = 70
XBxy =
Y
∑dxdy X n – (∑dx X ∑dy)Bxy=
∑dy2 X n – (∑dy)2
= 3900 X 10 – (230 X 0)2868 X 10 – (0) 2
=39000 – 0 = 39000 28680 – 0 28680 = 1.359
X on y Regression on coefficients
YBxy =
X
∑dxdy X n – (∑dx X ∑dy)Bxy=
∑d2x X n – (∑dX)2
= 3900 X 10 – (230 X 0)11150 X 10 – (230) 2
=39000 = 39000 11150 - 52900 = 1.359 = 0.665
Y on X Regression on coefficients
Coefficient of Correlation = √bxy X byx =√1.359 X 0.665 = 0.9506
ILLUSTRATION = 11The following data related to the ages of husband and wives. Obtain the two Regression
equations and estimate the most likely age of husband for the age of wife 25 years.Ages of husbands 25 28 30 32 35 36 38 39 42 55Ages of wife’s 20 26 29 30 25 18 26 35 35 46
SOLUTION
Xx-36dx
D2x YY-29dy
D2y dxdy
25283032353638394255
-4-8-6-4-1023619
121643616104936361
20262930251826353546
-9-301-4-11-36617
819011612193636289
99240-440-61836323
N=100
∑dx648
∑d2x0
∑dy598
∑d2y494
∑dxdy
148
Regression Equation
(X-X) = bxy (Y-Y)X-90 = 1.359 (Y-70)X-90 = 1.359Y – 95.BX = 1.359Y – 95.B + 90X = 1.359Y - 5.13 R.Line
Regression Equation
(Y-Y) = byx (X-X)(Y-70)= 0.665 (X-90)Y-70 = 0.665X – 59.85Y = 0.665X – 59.85 + 70Y = 0.665X + 10.15 R.Line
∑dxdy X n – (∑dx X ∑dy)byx =
∑d 2y X n-(∑dx)2
= 494 X 10 – 0 X 0598X 10 – (0) 2
= 4940=0.8261
∑dxdy X n – (∑dx X ∑dy)byx = ∑d 2xX n-(∑dx)2
= 494 X 10 – (0 X 0)648X 10 – (0) 2
= 49406480 =0.7623
X = A + ∑dx X C N
= 36 + 0 X 1 10X =36Y = A + ∑dy X C
N=29 + 0 X 1 10Y = 29
XBxy = r R. coefficient.
Y Y
Box = r R. coefficient X
ILLUSTRATION =12A panel of two Judges P and Q graded dramatic performance by independently awarding marks as
follows.Performance 1 2 3 4 5 6 7Marks by ‘P’ 46 42 44 40 43 41 45Marks by ‘Q’ 40 38 36 35 39 37 41
The eight performance which judge Q could not attend, was awarded 37 marks by judge P. If Judge Q had also been present, how many marks could be expected to have been awarded by him to the eight performances.SOLUTION
Let the marks awarded by judge P be represented by X and those awarded by judge Q be Y. We have to find out the value of Y when X=37. This can be done by finding out the regression equation Y on X.Computation of Regression Equation Y on X
XX-43Dx
D2X YY-38dy
D2Y dxdy
46424440434145
3-11-30-22
9119044
40383635393741
20-2-31-13
4049119
60-29026
0∑dx
28∑d2X
0∑dy
28∑d2y
21∑dxdy
Regression Equation of Y on X
If judge Q was present, he would have awarded 33.5 marks.
149
∑dxX=A+ X C
N
=43+ 0 X 1 7X=43
Y=A + ∑dy X C N
=38 + 0 X 1 7Y=38
Y- Y = bxy (X-X)Y – 38 = bxy (X-43)
∑dxdy X n – (∑dx X dy) 21 X 7 – 0 147bxy = ∑d2x X n – (∑dx)2 28 X 7 – 0 = 196 = 0.75Y – 38 = 0.75 (X-43)Y-38 = 0.75X – 32.25Y=0.75x +38 – 32.25Y=0.75x + 5.75 R.LineWhen X = 37=0.75 X 37 + 5.75 Y=33.5
XBxy= r
Y
Regression Equation
X – X = bxy (Y-Y)X – 36 = 0.8261Y – (Y-29)X –36 = 0.8261Y – 23.9569X=0.8261Y – 23.9569 + 36X = 0.8261Y + 12.0431 R.LIf a wife’s age is 25 (y)X = 0.8261 X 25 + 12.0431=20.6525 + 12.0431X = 32.6956Husband’s age is 32.6956 years
Regression Equation
Y – Y = byx (X-X)Y – 29 = 0.7623 – (X-36)Y –29 = 0.7623X – 27.4428Y=0.7623X – 27.4428 + 29
Y = 0.7623X + 1.5572 R.Line
Coefficient of correlationr=√bxy X byx=√0.8261 X 0.7623r = 0.7935
REGRESSION EQUATION IN A BIVARIATE GROUPED FREQUENCY DISTRIBUTIONThe procedure is the same as we have followed in case of individual series.
The modified formula is as under ; Regression coefficient of X on Y
ILLUSTRATION = 12
Following table gives the ages of husbands and wives for 50 newly married couples. Find the two regression lines. Also estimate. A) The age of husband when wife is 20 and B) The age of wife when husband is 30.
Age of wivesAge of Husbands
20-25 25-30 30-35 Total16-20 9 14 - 2320-24 6 11 3 2024-28 - - 7 7Total 15 25 10 50
SOLUTION Class interval for age of husband x is = 5
Class interval for age of wife (Y) is =4
A=27.5 C=5A=22 C=4
X 20-25 25-30 30-35 Total
22.5 27.5 32.5
Y MV dxdy
-1 0 1 f fdy fd2y fdxdy
16-20 18 -1 9 9 14 - 23 -23 23 9
150
Xi.e,bxy=
Y ∑fdxdy X N – (∑fdx X ∑fdy) c of x
bxy = X ∑fd2y X N - (∑fdy)2 c of y
Regression coefficient of Y on X
Yi.e box = r
X∑fdxdy X N – (∑fdx X ∑fdy) c of y
box = X∑fd2x X N – (∑fdx)2 c of x
Coefficient of correlation = √bxy X byx
X – 27.5Dx = 5
Y – 22dy = 4
20-24 22 0 6 11 3 20 0 0 0
24-28 26 1 - -7
7 7 7 7 7
Total F 15 25 1050N
-16∑fsy
30∑fd2y
16
fdx -15 0 10-5
∑fdx
Fd2x 15 0 1025
∑fd2xfdxdy 9 0 7 16
r =√bxy X box =√0.723 X 0.47 = 0.5829
ILLUSTRATION –14 The following are the marks obtained by 132 students in Test X and Test Y. calculate a) The Regression Coefficient
b) Two Regression Equationsc) Coefficient of correlation
XY
30-40 40-50 50-60 60-70 70-80 Total
151
∑fdx –5X = A + X C = 27.5 + X 5
N 50 = 27
Regression Coefficient of X on Y∑fdxdy X N – (∑fdx X ∑fdy) c of x
bxy = X ∑fd2y X n – (∑dy)2 c of y =16 X 50 – (-5 X –16) 5 X 5 30 X 50 – (-16)2 4 4800 – 80 5 720 5
= X = X 1500 –256 4 1244 4 = 3600 4976 = 0.723
(X-X) = bxy (Y-Y)X – 27 = 0.723 (Y – 20.72)X – 27 = 0.723Y – 14.98X = 0.723Y – 14.98 + 27X = 0.723Y + 12.02 R. LineEstimate of husband’s age when Y =20X = 0.723 X 20 + 12.02X = 26.48 years
X on Y R.E
∑fdx –16Y = A + X C = 22 + X 4
N 50 64
= 27 – 50
= 22 – 1.28 = 20.72∑fdxdy X N – (∑fdx X ∑fdy) c of y
bxy = X ∑fd2y X n – (∑fdy)2 c of x =16 X 50 – (-5 X –16) 4 X 4 25 X 50 – (-5)2 4 5800 – 80 4 720 4
= X = X 1500 –256 5 1225 5 = 2880 6125 = 0.47
(Y-Y) = byx (X-X)(Y – 20.72) = 0.47 (X – 27)Y – 20.72 = 0.47X – 12.69Y = 0.47X – 12.69 + 20.72Y= 0.47X + 12.03 R. LineEstimate of wife’s age when X =30Y = 0.47 X 30 + 8.03 = 1410 + 8.03 = 22.13 years
Y on X R.E
20-30 2 5 3 - - 1030-40 1 8 12 6 - 2740-50 - 5 22 14 1 4250-60 - 2 16 9 2 2960-70 - 1 8 6 1 1670-80 - 2 4 2 8Total 3 21 63 39 6 132
SOLUTION A=55 c=10 A=45C=10
X 30-40 40-50 50-60 60-70 70-80 Total
35 45 55 65 75
Y MV dxdy
-2 -1 0 1 2 f fdy Fd2y fdxdy
20-30 25 -28
210
5 3 - - 10 -20 40 18
30-40 35 -12
18
8 12-6
6 - 27 -27 27 4
40-50 45 0 -0
5 220
140
1 42 0 0 0
50-60 55 1 --2
2 169
94
2 29 29 29 11
60-70 65 2 -2
1 812
64
1 16 32 64 14
70-80 75 3 - - 212
412 2 8 24 72 24
Total F 3 21 63 39 6132n
38∑fdy
232∑fd2y
71
Fdx -6 -21 0 39 1224
∑fdx
Fd2x 12 21 0 39 2496
∑fd2xfdxdy 10 14 0 27 20 71
152
∑fdx X = A + X C
N=55 + 24 X 10 132=55 + 240 132=55 + 1.82 X = 56.82
∑fdy Y = A + X C
N=45 + 38 X 10 132=45 + 380 132=45 + 2.878 = 47.878
Regression on Coefficient of X on Y ∑fdxdy X N – (∑fdx X ∑fdy) C of Xbxy = X ∑fd2y X N – (∑fdy)2 C of Y= 71 X 132 – (24 X 38) 10
232 X 132 – (38)2 10= 9372 – 912 = 8460
30624 – 1444 29180 =0.289R. Equation
X-X=bxy (Y-Y)X-56.82 = 0.289 (Y-47.88)X-56.82=0.29Y – 13.8852X=0.29Y – 13.8852 + 56.82X=0.29Y + 42.93 R.Line
Regression on Coefficient of Y on X ∑fdxdy X N – (∑fdx X ∑fdy) C of Ybyx = X ∑fd2x X N – (∑fdx)2 C of X= 71 X 132 – (24 X 38) 10
96 X 132 – (24)2 10= 8460 = 8460
12672 – 576 12096 =0.699R. Equation
Y-Y=bxy (X-X)Y-47.88 = 0.699 (X-56.82)Y-47.88=0.7x– 39.774Y=47.88=0.7x-39.774Y=0.7x + 8.11 R.Line
Coefficient of Correlation = √bxy X byx =√0.29 X 0.7 = 0.450
ILLUSTRATION = 15
Following is the distribution of students according to their Height and Weight.
HeightIn inches X
Weight in lbsY90-100 100-110 110-120 120-130 TOTAL
50-55 4 7 5 2 1855-60 6 10 7 4 2760-65 6 12 10 7 3565-70 3 8 6 3 20
TOTAL 19 37 28 16 100
From the above,a) Estimate the weight when height is 63 inchesb) Estimate the height when weight is 115 lbsc) Calculate coefficient of correlation
SOLUTION: Let X be height in inches, Let Y be weight is lbs
Y 90-100 100-110110-120
120-130
Total
95 105 115 125
X MV dydx
-2 -1 0 1 f fdx fd2x fdxdy
50-55 52.5 -216
414
7 5-4
2 18 -36 72 26
153
∑fdxX = A X C
N - 43
=62.5 + X 5 100
= 62.5 – 215 100
= 60.35
∑fdyY= A X C
N - 59
=115 + X 10 100
= 115 - 590 100
Y = 109.1
55-60 57.5 -112
610
10 7-4
4 27 -27 2718
60-65 62.5 00
60
12 100
7 35 0 0 0
65-70 67.5 1-6
3-8
8 63
3 20 20 20 -11
Total f 19 37 28 16100N
-43∑fdx
119∑fd2x
33
fdxy -38 -37 0 16 -59 ∑fdy
∑fdxdyfd2y 76 37 0 16129
∑fd2
yfdxdy 22 16 0 -5 33
ILLUSTRATION = 16From the following data find:
a) The most probable value of Y, when X is 60 andb) The most probable value of X, when Y is 40 andc) The coefficient of correlation
154
X =53.2, Y=27.9, byx -1.5 and bxy = - 0.2
Xbxy = r
Y ∑fdxdy X N – (∑fdx X ∑fdy) Cof x
bxy = X ∑fd2y X N – (∑fdy)2 Cof y
=33 X 100 –(-43 X 59) 5 129 X 100 – (59)2 10 3300 - 2537 = X 0.5 12900 – 3481= 763 X 0.5 = 381.5 9419 1 9419 = 0.0405R. Equation
(X – X) = bxy (Y-Y)X – 60.35 = 0.0405 (Y – 109.1)X – 60.35 = 0.0405y – 4.41855X=0.0405y – 4.41855 + 60.35X=0.0405y + 55.93145 R.LEstimation of height (x) when weight (y) is 115 lbs.X=0.0405 X 115 + 55.93145X=4.6575 + 55.93145X=60.6 inches height
X on Y Regression Equation
Ybyx = r
X ∑fdxdy X N – (∑fdx X ∑fdy) Cof y
byx = X ∑fd2x X N – (∑fdx)2 Cof x
=33 X 100 –(-43 X 59) 10 119 X 100 – (-43)2 5 3300 + 2537 2 = X 11900 – 1849 1= 763 X 2 =0.15 10051 byx =01518 R. Equation
(Y – Y) = bxy (X-X)Y – 109.1 = 0.1518 (X – 60.35)Y – 109.1 = 0.1518x – 9.16113Y=0.1518x – 9.16113 + 109.1Y=0.1518x + 99.93897 R.LEstimation of weight (y) when height (x) is 63 inches.Y=0.1518 X 63 + 99.93897 =9.5634 + 99.93897Y=109.5 lbs r=√bxy X box =√0.0405 X 04518 = 0.0784
Y on X Regression Equation
SOLUTION
Coefficient correlation will be r = √bxy X box = √-1.5 X –0.2= - 0.5477
THEORETICAL QUESTIONS (5 , 10 & 15 Marks)1.What is meant by Regression? How is this concept useful to business fore casting? 2. Destination clearly between correlation and Regression analysis.3. What is Regression analysis? State its uses.4. Define Regression and explain its importance5. Briefly explain:
a. Regression lineb. Regression Equationc. Regression Coefficient
PRACTICAL PROBLEMS6. Given the following data, calculate,
a. The expected value of Y when X=60b. The expected value of X when Y=120
X YMean 65 120SD 5 10
Coefficient of correlation = 0.6 [Answers, X=65 Y=114]
PROBLEM = 07Given the following data estimate the marks in Mathematics for a student who has secured 60 marks in English.
Arithmetic Average of Marks in Maths = 80Arithmetic Average of Marks in English = 50
SD of Marks in Mathematics _ _ _ _ _ _ _ 15 SD of Marks in English _ _ _ _ _ _ _ _ _ _ 10
Coefficient of Correlation _ _ _ _ _ _ _ _ _ _ 0.4[Answer : 86]
PROBLEM = 08Find the most likely Price in Bangalore corresponding to the price ofRs.70 at Mysore from the following
data Average price at Mysore = Rs.65
Average price at Bangalore = Rs.67 SD of Price at Mysore = Rs.2.5
SD of Price at Bangalore = Rs.3.5Coefficient of correlation between the two prices of the commodity in the two cities is 0.8.Also estimate the price at Mysore Corresponding to the price Rs.50 at Bangalore.
155
X on Y R.Equation
X(X-X)=r (Y-Y)
Y(X-53.2)=-0.2 (Y-27.9)X-53.2 = - 0.2Y + 5.58X = -0.2Y + 5.58 + 53.2X = -0.2Y + 58.78 R.LineIf Y is 40X= - 0.2 X 40 + 58.78X = 50.78
Y on X R.Equation
(Y-Y) = box (X-X)Y-27.9 = -1.5 (X-53.2)Y-27.9 = - 1.5x + 79.8Y = - 1.5x + 79.8 + 27.9Y=1.5x + 107.7 R.LIf x is 60Y = -1.5 X 60 + 107.7= - 90 + 107.7Y = 17.2
[Answer: 72.6 and 55.3]PROBLEM = 09
You are given the following data. X Y
Mean 36 85 S. D. 11 8
Coefficient of correlation = 0.66 1.Find the two regression equations
2.Estimate the Value of X when Y = 75[Answer X75= 26.92]
PROBLEM = 10 The following are the marks in Statistics (X) and Mathematics (Y) of ten students
X 56 55 58 58 57 56 60 64 69 57Y 68 67 67 70 65 68 70 66 68 66
Calculate the coefficient of correlation based on bxy and byx also estimate the marks in Mathematics of a student who scores 62 marks in Statistics.
[Answer: r = 0.78,bxy= 0.0294, Y = 67.59]PROBLEM NO: 11
From the following data, obtain both the regression equations and estimate the demand (Y) if the price (X) is 75.Price (X) 60 63 66 69 72 78 81 90 96 99
Demand(Y) 85 87 84 80 82 79 78 73 70 72
PROBLEM NO: 12Form the data given below, find
a. The two regression equations b. The Coefficient of Correlation between the marks in Economics and Statistics.c. The most likely marks in Statistics when marks in Economics are 30.
Marks in Economics X 25 28 35 32 31 36 39 38 34 32Marks in Statistics Y 43 46 49 41 36 32 31 30 33 39
[Ans: X = 40.892 –1.234Y, Y = 59.248 –0.664X, r =0.394, Y= 39]PROBLEM =13
The following data relate to price and demand of a commoditya) Estimate demand when price is Rs.30b) Estimate price when demand is 65 unitsc) Coefficient of correlation.
Demand in units 20 22 25 23 18 16 14 17 21 19Price in Rs 50 45 38 42 55 58 59 54 49 57
[Answer a) 29.6 b) 13.21 c) r = - 0.94]
PROBLEM = 14The following table shows the frequency distribution of couples classified according to the ages.
Calculate,a) Obtain two Regression coefficients.b) Estimate the age of husband when wife’s age is 28 years.c) Calculate coefficient of correlation.
Wife’s ageIn years Y
Husbands age in years X20-25 25-30 30-35 35-40 TOTAL
15-20 20 10 3 2 3520-25 4 18 6 4 3225-30 - 5 11 - 1630-35 - - 2 - 235-40 - - - 5 5
TOTAL 24 33 22 11 90[ Answers, r = 0.612, X = 22.5, Y = 28.6, b = 31.7 , box = 0.558 ]
156
PROBLEM = 15 From the following data,
a) Estimate X when Y = 30 and also b) Estimate Y when X = 20 XY
5-15 15-25 25-35 35-45 TOTAL
0-10 1 1 - - 210-20 3 6 5 1 1520-30 1 8 9 2 2030-40 - 3 9 3 1540-50 - - 4 4 8
TOTAL 5 18 27 10 60[Answer a) 28.7 b)22.31]
PROBLEM NO =16From the following data, calculate
a)Regression coefficients b) Coefficient of correlation based on bxy and box. YX
30-35 35-40 40-45 45-50 TOTAL
25-30 20 10 3 2 3530-35 4 28 6 4 4235-40 - 5 11 - 1640-45 - - 2 - 245-50 - - - 5 5
TOTAL 24 43 22 11 100[Answer: X = 32.5, Y = 38.5 bxy = 0.6744 box = 0.5576, = 0.6132]
PROBLEM = 17Calculate two Regression Coefficients. Estimate the value of X when Y = 49 also calculate
coefficient of correlation based on bxy and box.X 43 44 46 40 44 42 45 42 38 40 42 57Y 29 31 19 18 19 27 27 29 41 30 26 10
[Answer X = 64.8, Y = ? , bxy = -0.44, byx = -1.2198, = -0.732]PROBLEM = 18
From the following bivariate table calculate the following a) Two Regression coefficientsb) Coefficient of correlation based on bxy and box
XY
59.9 79.5 99.5 119.5 139.5 159.5 179.5 TOTAL
2.25 3 4 3 6 2 1 1 207.25 2 3 5 10 3 1 1 2512.25 5 4 6 11 5 3 3 3717.25 10 11 12 15 12 15 10 8522.25 4 2 3 10 7 5 6 3727.25 1 1 2 8 8 5 4 2932.25 1 1 1 10 5 4 5 27
TOTAL 26 26 32 70 42 34 30 260
[Answer: X = 17.80, Y = 122.42, bxy = 0.05, box = 1.06, r = 0.230]
157
158