correlation and regression 300... · 2021. 6. 14. · types of correlation 7 positive linear...
TRANSCRIPT
Chapter – II
1
Correlation
and
Regression Analysis - I
CORRELATION
4
• Explanatory Variable
or
Influencing variable
or
Predictor variable
• Response Variable
5
Independent Variable
Dependent Variable
6
X Y
2 55
3 58
5 49
7 48
9 35
15 33
Types of correlation
7
Positive Linear Relationship Negative Linear Relationship
Non-Linear Relationship No Relationship
8
Pearson’s Correlation Coefficient
Pearson’s Correlation Coefficient
10
YYXX
XY
SS
Sr
=
( )( )YYXXS XY −−=
( )2
−= XXS XX
( ) −=2
YYSYY
Pearson’s Correlation Coefficient
11
−−
−=
2222 )()( yynxxn
yxxynr
Pearson’s Correlation Coefficient
12
−
−
−
=
n
yy
n
xx
n
yxxy
r2
2
2
2)()(
Example• Find the Pearson’s correlation coefficient of the given
variables
13
Price Proft
2 6
3 7
5 10
7 11
10 13.5
14
y x xy x2
y2
2 6
3 7
5 10
7 11
10 13.5
27 47.5 295 488.25 187
Example
Nine students held their breath, once afterbreathing normally and relaxing for oneminute, and once after hyperventilatingfor one minute.
The table indicates how long (in sec) theywere able to hold their breath.
Is there an association between the twovariables?
15
16
Subject Normal Hypervent
A 56 87
B 56 91
C 65 85
D 65 91
E 50 75
F 25 28
G 87 122
H 44 66
I 35 58
Example • Identify whether there is an association between the given
variables
17
Price Demand
2 55
5 49
7 48
3 58
15 33
9 35
18
Ne
ga
tiv
e L
ine
ar
Re
lati
on
ship
Po
siti
ve
Lin
ea
r R
ela
tio
nsh
ip
Pe
rfe
ct
Pe
rfe
ct
No Linear Relationship
0.20 0.40 0.60 0.80 1.00-1.00 -0.80 -0.60 -0.40 -0.20 0.00
Po
siti
ve
Lin
ea
r R
ela
tio
nsh
ip
Po
siti
ve
Lin
ea
r R
ela
tio
nsh
ip
Po
siti
ve
Lin
ea
r R
ela
tio
nsh
ip
Po
siti
ve
Lin
ea
r R
ela
tio
nsh
ip
Po
siti
ve
Lin
ea
r R
ela
tio
nsh
ip
Ne
ga
tiv
e L
ine
ar
Re
lati
on
ship
Ne
ga
tiv
e L
ine
ar
Re
lati
on
ship
Ne
ga
tiv
e L
ine
ar
Re
lati
on
ship
Ne
ga
tiv
e L
ine
ar
Re
lati
on
ship
Ne
ga
tiv
e L
ine
ar
Re
lati
on
ship
Ve
ry W
ea
k
We
ak
Mo
de
rate
Str
on
g
Ve
ry S
tro
ng
Ve
ry S
tro
ng
Str
on
g
Mo
de
rate
We
ak
Ve
ry W
ea
k
19
- 1 perfect positive linear relationship
0.80 - 0.99 very strong positive linear relationship
0.60 - 0.79 strong positive linear relationship
0.40 - 0.59 moderate positive linear relationship
0.20 - 0.39 weak positive linear relationship
0.01 - 0.19 very weak positive linear relationship
- 0 no linear relationship
-0.19 - - 0.01 very weak negative linear relationship
- 0.39 - - 0.20 weak negative linear relationship
- 0.59 - - 0.40 moderate negative linear relationship
- 0.79 - - 0.60 strong negative linear relationship
- 0.99 - - 0.80 very strong negative linear relationship
- - 1 perfect negative linear relationship
100 r2
known as the
coefficient of determination
which gives us the percentagedependency of the response variable inthe explanatory variable
20
Spearman’s Correlation coefficient
• Sometimes it is called Rank Correlation
•Denoted by “ R “
21
Spearman Correlation Coefficient
Spearman’s Rank Correlation Coefficient
23
)1(
61
2
2
−−=
nn
dR
Example
• An expert was asked to rank, according totaste, seven wines costing below £ 4. Herrankings ( with 1 being the worst taste and7 the best ) and the prices per bottle wereas follows.
24
Sample Rank of taste Price (£)
A 1 2.49
B 2 2.99
C 3 3.49
D 4 2.99
E 5 3.59
F 6 3.99
G 7 3.99
25
Sample Rank of
taste
Rank of
price
d d2
A 1
B 2
C 3
D 4
E 5
F 6
G 726
Which correlation to use
• If you intend to use regression then thechoice should be Pearson’s correlationcoefficient
27
REGRESSION Analysis
• Here we try to find a linear relation between theexplanatory variable ( x ) and the responsevariable ( y ) in the form
y = 0 + 1 x +
where ‘0’ and ‘1’ are constant to be determined
29
30
Regression
Graphical Method
Method of least Squares
Formulae for ‘0’ and ‘1’
31
( )( )( )22
1
−
−=
xxn
yxxyn
−
=
n
x
n
y10
Example
• A company has the following data on thelast year sales in each of its regions and thecorresponding number of sales personsemployed.
32
Region Sales Salespersons
A 236 11
B 234 12
C 298 18
D 250 15
E 246 13
F 202 10
33
x y x2 x y
11 236
12 234
18 298
15 250
13 246
10 202
79 1,466 1,083 19,736
34
Interpreting ‘0’ and ‘1’
35
Forecasting
•Once the equation of the regression linehas been computed, it is straight forwardprocess to obtain forecasts.
36
Example
• Considering the previous example forecastthe number of sales that would beexpected next year in regions thatemployed
• 14 salespersons
• 25 salespersons
37
Judging the Validity of Forecasts
• Validity of forecasts is given by thecoefficient of determination 100 r 2
38
Profits
£ m
Advertising expenditure
£ m
11.3 0.52
12.1 0.61
14.1 0.63
14.6 0.70
15.1 0.70
15.2 0.7539
A company has the following data on its profits
and advertising expenditure over the last six
years:
40
i. Identify the predictor and the response
variable
ii. Find the correlation coefficient and
interpret
iii. Find the best fit line
iv. Interpret your best fit line
v. Forecast the profits for the next year if an
advertising budget of £ 800,000 is
allocated,
vi. What is the validity of your forecast
Hours Studied Test Score
4 31
9 58
10 65
14 73
4 37
7 44
12 60
22 91
1 21
17 84 41
Consider the following data on the number of hours which 10
persons studied for a math test and their test scores :
42
i. Identify the predictor and the response
variable
ii. Find the correlation coefficient and interpret
iii. Find the least square regression line
iv. Interpret your regression line
v. If a student studies for 20 hours what would
be his expected test score (round to the
nearest integer)
vi. What is the validity of your forecast