regression kann ur 14
DESCRIPTION
statTRANSCRIPT
-
1
Regression Analysis
BY
DR. ISMAIL B
PROFESSOR
DEPARTMENT OF STATISTICS MANGALORE UNIVERSITY
MANGALAGANGOTHRI
e-mail: [email protected]
-
2
-
3
-
4
-
5
Descriptive Statistics
-
6
Using the p-value to make
the decision
The p-value is a probability computed assuming the null
hypothesis is true, that the test criterion would take a value as
extreme or more extreme than that actually observed.
Since its a probability, it is a number between 0 and 1. The
closer the number is to 0 means the event is unlikely..
So if p-value is .small,. we can then reject the null hypothesis.
-
7
Using the p-value to make the
decision
How much small??? Smaller than level of significance
= .05 or .01. So Using the p-value to make the decision
If .01
-
8
-
9
Answer What is the relationship between the variables?
Equation used
1 numerical dependent (response) variable
What is to be predicted: Y
1 or more numerical or categorical independent (explanatory) variables: X
Different techniques are using for different levels of measures.
-
10
Types of Regression Models
Regression
Models
LinearNon-
Linear
2+ ExplanatoryVariables
Simple
Non-Linear
Multiple
Linear
1 ExplanatoryVariable
-
11
Types of Regression Models
Regression
Models
LinearNon-
Linear
2+ ExplanatoryVariables
Simple
Non-Linear
Multiple
Linear
1 ExplanatoryVariable
Log linear Linear Dependent
-
12
Y
Y = bX + a
a = Y-intercept
X
Change
in Y
Change in X
b = Slope
Linear Equations
Simple Linear Regression model is given by
Y=a+bX+e
-
13
Simple Linear Regression Model
iii XY 10
Y intercept (Constant term)
Slope
The Straight Line that Best Fit the Data
Relationship Between Variables Is a Linear Function
Random
Error
Dependent
(Response)
Variable Independent
(Explanatory)
Variable
-
14
i = Random Error
Y
X
Linear Regression Model
Observed
Value
Observed Value
YX i
X 0 1
Y X i i i 0 1
(E(Y/X))
^
^
^
-
15
X.for aluesdistinct v oatleast tw have that westates sassumption This
.n aslimit finite a has and 0/x 5.
es.disturbanc with thecorrelatednot hence
and samples repeatedin fixed i.e.,
.stochastic-non is X y variableexplanator The 4.
.correlatedun are edisturbanc i.e.,
jifor ,0E 3.
ariance.constant v have distubance i.e.,
1,2,....n i ,)V( 2.
mean. zero have edisturbanc 0,)E( 1.
sAssumption
n
1t
2
t
i
2
i
n
j
-
16
The Sum of Squares
Xi
Y
X
Y
SST = (Yi - Y)2
SSE =(Y - Yi )2
SSR = (Yi - Y)2
_
_
_
X
-
17
21-1-
n
2
1
2
1
2
i
X)(X' )V( Y,X'X)(X'
~
X 1
.
.
X 1
X 1
X
.
.Y
XY
as nsobservatio theallfor model SLR can write we
X-Ythen
x
is ofestimator squareLeast :BLUE
n
i
i
y
y
y
x
y
-
18
).( is for interval confidence 95%)( )%-(1
level. cesignifican % at 0:Hreject 2-n , If
)xs()S.E(
S.E
t
0 :H Testing
Xe sidual,Re
)2/(
of Estimation
/)V(
2,2
OLS
02
21
2
i
2
21
1
22
obs
0
ii
22
2
1
22
set
tt
xs
Y
ne
x
n
obs
OLSn
i
iOLS
OLSOLSi
n
i
i
i
-
19
n
i
i
n
i
i
i
i
i
i
n
i
i
n
i
i
iii
ye
Y
e
y
eyy
YYYYYe
i
1
2
1
22
2
2
i
2
2
2
2
2
2
2
2
2
2
1
2
1
22
ii1
1R
is R Centered
Y
Y1R
is R uncentered The
fit. of measure as used is R uncentered uspresent thnot isintercept theIf
X. and Ybetween n correlatio squared simpleR (ii)
Y & Ybetween n correlatio squaredR i)(
1/R
.regression in theconstant a is thereas long as
Y ,0e ,
:fit of measureA
-
20
d.f. 2-non with distributi-for t obtained valuecritical 2.5% represents t
XX11tY
bygiven ,X of eevery valufor sprediction
these tointervals confidence 95%construct can one
XX1YV
result. Markav-Gauss theUsing
XY is )E(Y of BLUP
XY
:Prediction
2-n0.025,
21
2
2
0
2-n0.025,0
0
2
2
02
0
0000
0000
i
i
xns
xn
-
21
figure. in theshown are residuals and values true,regression thesefrom valuesfitted The
income. disposable personal zeroat n consumptio estimated theis This
0.4286..5)(0.8095)(7-6.5X-Y
income. disposable of Rs extraan by about brought n consumptio extra theis This -
consume. topropensity marginal estimated- 8095.0
XY
:Solution
Rs. 1000in measured eexpenditur and incomeBoth
income. disposable personal fixed a with households of group a from
randomly selectedeach households 10 ofn consumptio Annual
:Example
ii
i
-
22
.Hreject not do weTherefore
0.498.value-p sincet significannot is which 709.0 t,0:H
t.significanhighly is X Hence .HReject ,0001.05.10tPvalue-p
50.10SE
0t
is 0:H test tostatisticsTest
0.60446)SE(
365374.0x
X1s)V(
is of variancesestimated and
0.077078)SE( 311905.0s ,005941.0s
)V(
0
00
08
0
0
2
i
2
2
2
2
n
xi
-
23
nconsumptioin variation theof 93.24% explains
income dosposable personal that means This
.9324.0ye-1
.9324.0/R
2
i
2
i
22222
iiii yxyxr
-
24
The Sum of Squares
SST = Total Sum of Squares
measures the variation of the Yi values around their mean Y
SSR = Regression Sum of Squares
explained variation attributable to the relationship between X and Y
SSE = Error Sum of Squares
variation attributable to factors other than the relationship between X and Y
_
-
25
The Coefficient of Determination
SSR regression sum of squares
SST total sum of squares r2 = =
Measures the proportion of variation in
the dependent variable explained by the
regression line
-
26
Simple Linear Regression
-
27
Simple Linear Regression
-
28
-
29
0 1 1 2 2...
n nY a a X a X a X e= + + + + +
( )2~ 0,Ne s
Y: Response variable
X: Explanatory variable
e : Error
-
30
Errors are independent (no auto correlation)
Errors are normally distributed
Errors have zero mean and constant variance
No multi- collinearity
Regressors are not random variables (fixed for repeated measurements)
-
31
Multiple Regression
-
32
Regression Diagnostic asks 3 questions:
Are the assumptions of multiple regression complied
with?
Is the model adequate?
Is there anything unusual about any data points?
-
33
Plot the ACF of residuals
0
50
100
1st
Qtr
3rd
Qtr
East
West
North
20015010050
60
50
40
30
20
10
0
-10
-20
-30
-40
Fitted ValueR
esid
ual
Residuals Versus the Fitted Values
(response is Crimrate)
Remedy?
Durbin Watson statistic (Normal value 0-4).
-
34
Plot residual versus fitted
Remedy?
-
35
Auto correlated Regression
-
36
Residual plot showing
Autocorrelation
-
37
Check by means of correlation matrix
Variance Inflation. Large changes in regression coefficients when variables are added or deleted.
Variance inflation factor (VIF)>4 indicate multi collinearity
VIF=1/(1-R^2)
Durbin Watson statistic is another check for collinearity. (Normal value 0-4).
Remedy?
-
38
Logistic Regression
Logistic regression is a form regression used when the dependent variable is dichotomy (binary) and independent variable is of any type
Continuous variable are not used as dependent variable. Logistic regression does not assume linearity of relationship between dependent and
independent variables
Does not assume normality and homoscedasticity It assumes that observations be independent and that independent variables are linearly
related the logit of the dependent.
The scatter plot of outcome variable (Y) vs. independent variables shows all points fall on one of the two parallel lines representing Y=0 and Y=1.
This scatter plot does provide clear picture of linear relationship. In linear regression the quantity E(Y/X) can take any value in range
where in logistic regression E(Y/X) lies between (0,1)
( , )
-
39
0 1 0 1
0 1
Let (x)=E(Y/X). The specific form of ( )
we use logistic regression model as
( ) exp( ) (1 exp( ))
The logit transformation of ( ) given by
g(x) = ln( ( ) (1 ( ))
=
The logit,
x
x x x
x
x x
x
g(x) is linear in parameter, continuous and
may range (- , ) depending on range of x. we may
express value of the outcome variable given x as
y= (x)
-
40
Binary Logistic Regression
-
41
Binary Logistic Regression
-
42
Binary Logistic Regression
-
43
Thanks !!!