regressi on
TRANSCRIPT
-
8/9/2019 Regressi On
1/16
Chapter
Regression Approaches Initial investigations. Simple linear regression model.
Parameter estimation. Forecasting.
Multivariate linear regression model. Parameter estimation. Forecasting.
Model building and residual analysis.
. p.1/1
-
8/9/2019 Regressi On
2/16
Initial Investigation
It is a good practice to carry out some investigation on the databefore performing advance analyses (e.g. modelling).
Some reasons for performing initial analyses: to identify some pattern of the data, to identify any potential outlier or non-normal behaviour of
some observations and
to understand the data better. Some possible methods that can be used: Plots (e.g. scatter plot, histogram, distribution plot etc.).
Simple statistics measurements (e.g. mean, variance,
skewness etc.).
. p.2/1
-
8/9/2019 Regressi On
3/16
Simple Linear Regression Mode
Objective: to model a relationship between two variables. This model assumes that the relationship between the dependent
variable, y, and the independent variable, x, can be described bya straight line:
y = 0 + 1x + (1)
where0 - intercept of y when x = 01 - slope; the change in the mean value of y associated with aunit increase in x - error term
All the unknown parameters can be estimated using least squaremethod so that the estimated model is y = b0 + b1x where b0 andb1 are unbiased estimators of 0 and 1 respectively.
. p.3/1
-
8/9/2019 Regressi On
4/16
Least Square Metho
Objective: this method seek for estimators (b0 and b1) that giveminimum total value of error rate e.
The total error is computed by:
t
e2t =t
(yt yt)2 (2)
By solving equation (2), then we obtain the estimators as follows: b0 = y b1x b1 = SSxy/SSxx
where
SSxy =nt=1
(xt x)(yt y) =t
xtyt t
xtt
yt
n
SSxx =n
t=1
(xt
x)2 =t
x2t
t
(xt)2
n
y =nt=1
yt/n and x =nt=1
xt/n. p.4/1
-
8/9/2019 Regressi On
5/16
Model Fi
(i) Determination of relationship between x and y.
Degree of relationship between x and y represents how variabilityin y can be explained by x.
In regression analysis, total variation consists of explainedvariation and unexplained variation,
Total variation the total of squared of errors obtained when we do
not consider the explain variable x,t
(yt y)2.
Unexplained variation it measures the amount of variation in thevalues of y that is NOT explained by x. Also called SSE,t
(yt yt)2.Explained variation it measures the amount of variation in the
values of y that is explained by x,t
(yt y)2.
. p.5/1
-
8/9/2019 Regressi On
6/16
Model Fi
So, the degree of relationship between x and y can be measuredusing a simple coefficient called R2
R2 =Explained variation
Total variation
where 0 R2 1. This coefficient gives the proportion of the total variation in y that
is explained by the simple linear regression model based on thesample of size n. The constructed model is explainable when R2
approaching 1.
R =
R2 (
1
R
1) gives a direction of relationship; R > 0
shows a positive relationship and R < 0 exhibits negativerelationship.
. p.6/1
-
8/9/2019 Regressi On
7/16
Model Fi
Hypothesis testing for determining significance relationship of xand yH0 : There is no relationship between x and y, = 0.
H1 : There is a relationship between x and y, = 1. The test statistic
t = rn21r2
. p.7/1
-
8/9/2019 Regressi On
8/16
Model Fi
(ii) An F-test for testing the model.
This statistic tests the significance of the constructed model.
Hypothesis testingH0 : 0 = 1 = 0.H1 : some parameters are important in the model.
The relevant test statistic
FM =Explained variation
Unexplained variation/(n1).
If the regression assumptions hold, then under H0 the statistic FM
will have F-distribution with 1 and n 2 degrees of freedom.
. p.8/1
-
8/9/2019 Regressi On
9/16
Model Fi
(iii) Testing significance of b1.
Objective: to check the significance relationship between x and y.
Null hypothesis (for example)H0 : 1 = 0 vs 1 = 0
If the regression assumptions hold, then
b1 N(1, b1 = /
SSxx)
where the estimator of b1 is sb1 = s/
SSxx
Then,b11sb1
has tdistribution with n 2 degrees of freedom.. p.9/1
-
8/9/2019 Regressi On
10/16
Model Fi
(iv) Testing significance of b0.
Objective: to check the significance of intercept in y-axis.
Null hypothesis (for example)H0 : 0 = 0 vs 0 = 0
If the regression assumptions hold, then b0
N(0, b0) where
the estimator of b0 is sb0 = s
1n +
x2
SSxx
Then,
b0
0
sb0
has tdistribution with n 2 degrees of freedom.
. p.10/1
-
8/9/2019 Regressi On
11/16
Model Adequacy Chec
Statistic models depend on some assumptions. These must bechecked so that the obtained results can be accepted.
In least square linear model, the following assumptions must befulfilled. A linear relationship between x and y. Error term, , must be normally distributed with mean 0 and a
constant variance . Any value of error is statistically independent of each other.
Mean square error, 2, is estimated bys2 = y2t b0 yt b1xt yt = SSE Standard error, , is estimated by
s =
SSEn2
All these can be checked through plots.. p.11/1
-
8/9/2019 Regressi On
12/16
Some Informative Plot
. p.12/1
Forecasting Using th
-
8/9/2019 Regressi On
13/16
Forecasting Using th
Simple Linear Regression Mode
Once the constructed model has been checked and we aresatisfied with it, then forecasting can be made.
Forecasting can be made through
point estimatee.g. let the constructed model is y = 0.5 + 2x. By replacing xwith a value then we obtain the value of y.
interval estimatee.g. by giving a value of x, we want to know range of possiblevalues of y. This can be done by solving the followingequation
y t(n2)(/2) s
1n + (x x)
2
SSxx
. p.13/1
Example
-
8/9/2019 Regressi On
14/16
Example
Quality Home Improvement Center (QHIC
QHIC operates five stores in a large metropolitan area. The marketing
department at QHIC wishes to study the relationship between home
value (in thousands $), x, and yearly expenditure on home upkeep
($), y. A random sample of 40 homeowners is taken, and they are asked
to estimate their expenditures during the previous year on the types of
home upkeep products and services offered by QHIC.
. p.14/1
-
8/9/2019 Regressi On
15/16
How to Investigate
Check list
1. What is the relationship between x and y? Is it linear?
2. What is the estimated value of parameters 0 and 1?3. Is the constructed model good enough?
4. Does the constructed model fulfill all the assumptions? (Checkthe error plots)
5. Can prediction or forecasting can be made?
6. What can we conclude about the constructed model?
. p.15/1
-
8/9/2019 Regressi On
16/16