regression kann ur 14

1

Regression Analysis

BY

DR. ISMAIL B

PROFESSOR

DEPARTMENT OF STATISTICS MANGALORE UNIVERSITY

MANGALAGANGOTHRI

e-mail: [email protected]

5

Descriptive Statistics

6

Using the p-value to make

the decision

The p-value is a probability computed assuming the null

hypothesis is true, that the test criterion would take a value as

extreme or more extreme than that actually observed.

Since its a probability, it is a number between 0 and 1. The

closer the number is to 0 means the event is unlikely..

So if p-value is .small,. we can then reject the null hypothesis.

7

Using the p-value to make the

decision

How much small??? Smaller than level of significance

= .05 or .01. So Using the p-value to make the decision

If .01

9

Answer What is the relationship between the variables?

Equation used

1 numerical dependent (response) variable

What is to be predicted: Y

1 or more numerical or categorical independent (explanatory) variables: X

Different techniques are using for different levels of measures.

10

Types of Regression Models

Regression

Models

LinearNon-

Linear

2+ ExplanatoryVariables

Simple

Non-Linear

Multiple

Linear

1 ExplanatoryVariable

11

Types of Regression Models

Regression

Models

LinearNon-

Linear

2+ ExplanatoryVariables

Simple

Non-Linear

Multiple

Linear

1 ExplanatoryVariable

Log linear Linear Dependent

12

Y

Y = bX + a

a = Y-intercept

X

Change

in Y

Change in X

b = Slope

Linear Equations

Simple Linear Regression model is given by

Y=a+bX+e

13

Simple Linear Regression Model

iii XY 10

Y intercept (Constant term)

Slope

The Straight Line that Best Fit the Data

Relationship Between Variables Is a Linear Function

Random

Error

Dependent

(Response)

Variable Independent

(Explanatory)

Variable

14

i = Random Error

Y

X

Linear Regression Model

Observed

Value

Observed Value

YX i

X 0 1

Y X i i i 0 1

(E(Y/X))

^

^

^

15

X.for aluesdistinct v oatleast tw have that westates sassumption This

.n aslimit finite a has and 0/x 5.

es.disturbanc with thecorrelatednot hence

and samples repeatedin fixed i.e.,

.stochastic-non is X y variableexplanator The 4.

.correlatedun are edisturbanc i.e.,

jifor ,0E 3.

ariance.constant v have distubance i.e.,

1,2,....n i ,)V( 2.

mean. zero have edisturbanc 0,)E( 1.

sAssumption

n

1t

2

t

i

2

i

n

j

16

The Sum of Squares

Xi

Y

X

Y

SST = (Yi - Y)2

SSE =(Y - Yi )2

SSR = (Yi - Y)2

_

_

_

X

17

21-1-

n

2

1

2

1

2

i

X)(X' )V( Y,X'X)(X'

~

X 1

.

.

X 1

X 1

X

.

.Y

XY

as nsobservatio theallfor model SLR can write we

X-Ythen

x

is ofestimator squareLeast :BLUE

n

i

i

y

y

y

x

y

18

).( is for interval confidence 95%)( )%-(1

level. cesignifican % at 0:Hreject 2-n , If

)xs()S.E(

S.E

t

0 :H Testing

Xe sidual,Re

)2/(

of Estimation

/)V(

2,2

OLS

02

21

2

i

2

21

1

22

obs

0

ii

22

2

1

22

set

tt

xs

Y

ne

x

n

obs

OLSn

i

iOLS

OLSOLSi

n

i

i

i

19

n

i

i

n

i

i

i

i

i

i

n

i

i

n

i

i

iii

ye

Y

e

y

eyy

YYYYYe

i

1

2

1

22

2

2

i

2

2

2

2

2

2

2

2

2

2

1

2

1

22

ii1

1R

is R Centered

Y

Y1R

is R uncentered The

fit. of measure as used is R uncentered uspresent thnot isintercept theIf

X. and Ybetween n correlatio squared simpleR (ii)

Y & Ybetween n correlatio squaredR i)(

1/R

.regression in theconstant a is thereas long as

Y ,0e ,

:fit of measureA

20

d.f. 2-non with distributi-for t obtained valuecritical 2.5% represents t

XX11tY

bygiven ,X of eevery valufor sprediction

these tointervals confidence 95%construct can one

XX1YV

result. Markav-Gauss theUsing

XY is )E(Y of BLUP

XY

:Prediction

2-n0.025,

21

2

2

0

2-n0.025,0

0

2

2

02

0

0000

0000

i

i

xns

xn

21

figure. in theshown are residuals and values true,regression thesefrom valuesfitted The

income. disposable personal zeroat n consumptio estimated theis This

0.4286..5)(0.8095)(7-6.5X-Y

income. disposable of Rs extraan by about brought n consumptio extra theis This -

consume. topropensity marginal estimated- 8095.0

XY

:Solution

Rs. 1000in measured eexpenditur and incomeBoth

income. disposable personal fixed a with households of group a from

randomly selectedeach households 10 ofn consumptio Annual

:Example

ii

i

22

.Hreject not do weTherefore

0.498.value-p sincet significannot is which 709.0 t,0:H

t.significanhighly is X Hence .HReject ,0001.05.10tPvalue-p

50.10SE

0t

is 0:H test tostatisticsTest

0.60446)SE(

365374.0x

X1s)V(

is of variancesestimated and

0.077078)SE( 311905.0s ,005941.0s

)V(

0

00

08

0

0

2

i

2

2

2

2

n

xi

23

nconsumptioin variation theof 93.24% explains

income dosposable personal that means This

.9324.0ye-1

.9324.0/R

2

i

2

i

22222

iiii yxyxr

24

The Sum of Squares

SST = Total Sum of Squares

measures the variation of the Yi values around their mean Y

SSR = Regression Sum of Squares

explained variation attributable to the relationship between X and Y

SSE = Error Sum of Squares

variation attributable to factors other than the relationship between X and Y

_

25

The Coefficient of Determination

SSR regression sum of squares

SST total sum of squares r2 = =

Measures the proportion of variation in

the dependent variable explained by the

regression line

26

Simple Linear Regression

27

Simple Linear Regression

29

0 1 1 2 2...

n nY a a X a X a X e= + + + + +

( )2~ 0,Ne s

Y: Response variable

X: Explanatory variable

e : Error

30

Errors are independent (no auto correlation)

Errors are normally distributed

Errors have zero mean and constant variance

No multi- collinearity

Regressors are not random variables (fixed for repeated measurements)

31

Multiple Regression

32

Regression Diagnostic asks 3 questions:

Are the assumptions of multiple regression complied

with?

Is the model adequate?

Is there anything unusual about any data points?

33

Plot the ACF of residuals

0

50

100

1st

Qtr

3rd

Qtr

East

West

North

20015010050

60

50

40

30

20

10

0

-10

-20

-30

-40

Fitted ValueR

esid

ual

Residuals Versus the Fitted Values

(response is Crimrate)

Remedy?

Durbin Watson statistic (Normal value 0-4).

34

Plot residual versus fitted

Remedy?

35

Auto correlated Regression

36

Residual plot showing

Autocorrelation

37

Check by means of correlation matrix

Variance Inflation. Large changes in regression coefficients when variables are added or deleted.

Variance inflation factor (VIF)>4 indicate multi collinearity

VIF=1/(1-R^2)

Durbin Watson statistic is another check for collinearity. (Normal value 0-4).

Remedy?

38

Logistic Regression

Logistic regression is a form regression used when the dependent variable is dichotomy (binary) and independent variable is of any type

Continuous variable are not used as dependent variable. Logistic regression does not assume linearity of relationship between dependent and

independent variables

Does not assume normality and homoscedasticity It assumes that observations be independent and that independent variables are linearly

related the logit of the dependent.

The scatter plot of outcome variable (Y) vs. independent variables shows all points fall on one of the two parallel lines representing Y=0 and Y=1.

This scatter plot does provide clear picture of linear relationship. In linear regression the quantity E(Y/X) can take any value in range

where in logistic regression E(Y/X) lies between (0,1)

( , )

39

0 1 0 1

0 1

Let (x)=E(Y/X). The specific form of ( )

we use logistic regression model as

( ) exp( ) (1 exp( ))

The logit transformation of ( ) given by

g(x) = ln( ( ) (1 ( ))

=

The logit,

x

x x x

x

x x

x

g(x) is linear in parameter, continuous and

may range (- , ) depending on range of x. we may

express value of the outcome variable given x as

y= (x)

40

Binary Logistic Regression

41


42


43

Thanks !!!

regression kann ur 14

Documents

y x y sst

x y variableexplanator

y yi

n21212ixx v y

x different techniques

wexythen x

value yx

yinterceptxchangein