regression kann ur 14

43
1 Regression Analysis BY DR. ISMAIL B PROFESSOR DEPARTMENT OF STATISTICS MANGALORE UNIVERSITY MANGALAGANGOTHRI e-mail: [email protected]

Upload: amer-rahmah

Post on 08-Nov-2015

223 views

Category:

Documents


0 download

DESCRIPTION

stat

TRANSCRIPT

  • 1

    Regression Analysis

    BY

    DR. ISMAIL B

    PROFESSOR

    DEPARTMENT OF STATISTICS MANGALORE UNIVERSITY

    MANGALAGANGOTHRI

    e-mail: [email protected]

  • 2

  • 3

  • 4

  • 5

    Descriptive Statistics

  • 6

    Using the p-value to make

    the decision

    The p-value is a probability computed assuming the null

    hypothesis is true, that the test criterion would take a value as

    extreme or more extreme than that actually observed.

    Since its a probability, it is a number between 0 and 1. The

    closer the number is to 0 means the event is unlikely..

    So if p-value is .small,. we can then reject the null hypothesis.

  • 7

    Using the p-value to make the

    decision

    How much small??? Smaller than level of significance

    = .05 or .01. So Using the p-value to make the decision

    If .01

  • 8

  • 9

    Answer What is the relationship between the variables?

    Equation used

    1 numerical dependent (response) variable

    What is to be predicted: Y

    1 or more numerical or categorical independent (explanatory) variables: X

    Different techniques are using for different levels of measures.

  • 10

    Types of Regression Models

    Regression

    Models

    LinearNon-

    Linear

    2+ ExplanatoryVariables

    Simple

    Non-Linear

    Multiple

    Linear

    1 ExplanatoryVariable

  • 11

    Types of Regression Models

    Regression

    Models

    LinearNon-

    Linear

    2+ ExplanatoryVariables

    Simple

    Non-Linear

    Multiple

    Linear

    1 ExplanatoryVariable

    Log linear Linear Dependent

  • 12

    Y

    Y = bX + a

    a = Y-intercept

    X

    Change

    in Y

    Change in X

    b = Slope

    Linear Equations

    Simple Linear Regression model is given by

    Y=a+bX+e

  • 13

    Simple Linear Regression Model

    iii XY 10

    Y intercept (Constant term)

    Slope

    The Straight Line that Best Fit the Data

    Relationship Between Variables Is a Linear Function

    Random

    Error

    Dependent

    (Response)

    Variable Independent

    (Explanatory)

    Variable

  • 14

    i = Random Error

    Y

    X

    Linear Regression Model

    Observed

    Value

    Observed Value

    YX i

    X 0 1

    Y X i i i 0 1

    (E(Y/X))

    ^

    ^

    ^

  • 15

    X.for aluesdistinct v oatleast tw have that westates sassumption This

    .n aslimit finite a has and 0/x 5.

    es.disturbanc with thecorrelatednot hence

    and samples repeatedin fixed i.e.,

    .stochastic-non is X y variableexplanator The 4.

    .correlatedun are edisturbanc i.e.,

    jifor ,0E 3.

    ariance.constant v have distubance i.e.,

    1,2,....n i ,)V( 2.

    mean. zero have edisturbanc 0,)E( 1.

    sAssumption

    n

    1t

    2

    t

    i

    2

    i

    n

    j

  • 16

    The Sum of Squares

    Xi

    Y

    X

    Y

    SST = (Yi - Y)2

    SSE =(Y - Yi )2

    SSR = (Yi - Y)2

    _

    _

    _

    X

  • 17

    21-1-

    n

    2

    1

    2

    1

    2

    i

    X)(X' )V( Y,X'X)(X'

    ~

    X 1

    .

    .

    X 1

    X 1

    X

    .

    .Y

    XY

    as nsobservatio theallfor model SLR can write we

    X-Ythen

    x

    is ofestimator squareLeast :BLUE

    n

    i

    i

    y

    y

    y

    x

    y

  • 18

    ).( is for interval confidence 95%)( )%-(1

    level. cesignifican % at 0:Hreject 2-n , If

    )xs()S.E(

    S.E

    t

    0 :H Testing

    Xe sidual,Re

    )2/(

    of Estimation

    /)V(

    2,2

    OLS

    02

    21

    2

    i

    2

    21

    1

    22

    obs

    0

    ii

    22

    2

    1

    22

    set

    tt

    xs

    Y

    ne

    x

    n

    obs

    OLSn

    i

    iOLS

    OLSOLSi

    n

    i

    i

    i

  • 19

    n

    i

    i

    n

    i

    i

    i

    i

    i

    i

    n

    i

    i

    n

    i

    i

    iii

    ye

    Y

    e

    y

    eyy

    YYYYYe

    i

    1

    2

    1

    22

    2

    2

    i

    2

    2

    2

    2

    2

    2

    2

    2

    2

    2

    1

    2

    1

    22

    ii1

    1R

    is R Centered

    Y

    Y1R

    is R uncentered The

    fit. of measure as used is R uncentered uspresent thnot isintercept theIf

    X. and Ybetween n correlatio squared simpleR (ii)

    Y & Ybetween n correlatio squaredR i)(

    1/R

    .regression in theconstant a is thereas long as

    Y ,0e ,

    :fit of measureA

  • 20

    d.f. 2-non with distributi-for t obtained valuecritical 2.5% represents t

    XX11tY

    bygiven ,X of eevery valufor sprediction

    these tointervals confidence 95%construct can one

    XX1YV

    result. Markav-Gauss theUsing

    XY is )E(Y of BLUP

    XY

    :Prediction

    2-n0.025,

    21

    2

    2

    0

    2-n0.025,0

    0

    2

    2

    02

    0

    0000

    0000

    i

    i

    xns

    xn

  • 21

    figure. in theshown are residuals and values true,regression thesefrom valuesfitted The

    income. disposable personal zeroat n consumptio estimated theis This

    0.4286..5)(0.8095)(7-6.5X-Y

    income. disposable of Rs extraan by about brought n consumptio extra theis This -

    consume. topropensity marginal estimated- 8095.0

    XY

    :Solution

    Rs. 1000in measured eexpenditur and incomeBoth

    income. disposable personal fixed a with households of group a from

    randomly selectedeach households 10 ofn consumptio Annual

    :Example

    ii

    i

  • 22

    .Hreject not do weTherefore

    0.498.value-p sincet significannot is which 709.0 t,0:H

    t.significanhighly is X Hence .HReject ,0001.05.10tPvalue-p

    50.10SE

    0t

    is 0:H test tostatisticsTest

    0.60446)SE(

    365374.0x

    X1s)V(

    is of variancesestimated and

    0.077078)SE( 311905.0s ,005941.0s

    )V(

    0

    00

    08

    0

    0

    2

    i

    2

    2

    2

    2

    n

    xi

  • 23

    nconsumptioin variation theof 93.24% explains

    income dosposable personal that means This

    .9324.0ye-1

    .9324.0/R

    2

    i

    2

    i

    22222

    iiii yxyxr

  • 24

    The Sum of Squares

    SST = Total Sum of Squares

    measures the variation of the Yi values around their mean Y

    SSR = Regression Sum of Squares

    explained variation attributable to the relationship between X and Y

    SSE = Error Sum of Squares

    variation attributable to factors other than the relationship between X and Y

    _

  • 25

    The Coefficient of Determination

    SSR regression sum of squares

    SST total sum of squares r2 = =

    Measures the proportion of variation in

    the dependent variable explained by the

    regression line

  • 26

    Simple Linear Regression

  • 27

    Simple Linear Regression

  • 28

  • 29

    0 1 1 2 2...

    n nY a a X a X a X e= + + + + +

    ( )2~ 0,Ne s

    Y: Response variable

    X: Explanatory variable

    e : Error

  • 30

    Errors are independent (no auto correlation)

    Errors are normally distributed

    Errors have zero mean and constant variance

    No multi- collinearity

    Regressors are not random variables (fixed for repeated measurements)

  • 31

    Multiple Regression

  • 32

    Regression Diagnostic asks 3 questions:

    Are the assumptions of multiple regression complied

    with?

    Is the model adequate?

    Is there anything unusual about any data points?

  • 33

    Plot the ACF of residuals

    0

    50

    100

    1st

    Qtr

    3rd

    Qtr

    East

    West

    North

    20015010050

    60

    50

    40

    30

    20

    10

    0

    -10

    -20

    -30

    -40

    Fitted ValueR

    esid

    ual

    Residuals Versus the Fitted Values

    (response is Crimrate)

    Remedy?

    Durbin Watson statistic (Normal value 0-4).

  • 34

    Plot residual versus fitted

    Remedy?

  • 35

    Auto correlated Regression

  • 36

    Residual plot showing

    Autocorrelation

  • 37

    Check by means of correlation matrix

    Variance Inflation. Large changes in regression coefficients when variables are added or deleted.

    Variance inflation factor (VIF)>4 indicate multi collinearity

    VIF=1/(1-R^2)

    Durbin Watson statistic is another check for collinearity. (Normal value 0-4).

    Remedy?

  • 38

    Logistic Regression

    Logistic regression is a form regression used when the dependent variable is dichotomy (binary) and independent variable is of any type

    Continuous variable are not used as dependent variable. Logistic regression does not assume linearity of relationship between dependent and

    independent variables

    Does not assume normality and homoscedasticity It assumes that observations be independent and that independent variables are linearly

    related the logit of the dependent.

    The scatter plot of outcome variable (Y) vs. independent variables shows all points fall on one of the two parallel lines representing Y=0 and Y=1.

    This scatter plot does provide clear picture of linear relationship. In linear regression the quantity E(Y/X) can take any value in range

    where in logistic regression E(Y/X) lies between (0,1)

    ( , )

  • 39

    0 1 0 1

    0 1

    Let (x)=E(Y/X). The specific form of ( )

    we use logistic regression model as

    ( ) exp( ) (1 exp( ))

    The logit transformation of ( ) given by

    g(x) = ln( ( ) (1 ( ))

    =

    The logit,

    x

    x x x

    x

    x x

    x

    g(x) is linear in parameter, continuous and

    may range (- , ) depending on range of x. we may

    express value of the outcome variable given x as

    y= (x)

  • 40

    Binary Logistic Regression

  • 41

    Binary Logistic Regression

  • 42

    Binary Logistic Regression

  • 43

    Thanks !!!