curves_coefficients_cutoffs@measurments_sensitivity_specificity

Upload: venkata-nelluri-pmp

Post on 26-Feb-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    1/66

    All Rights Reserved, Indian

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    2/66

    All Rights Reserved, Indian

    Session Outline

    Introduction to classification problems and discrete choice models.

    Introduction to Logistics Regression.

    Logistic function and Logit function.

    Maximum Likelihood Estimator (MLE) for estimation of LR param

    Examples: Challenger Shuttle, German Bank Credit Rating.

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    3/66

    All Rights Reserved, Indian

    Classification Problems

    Classification is an important category of problems in which the decis

    like to classify the customers into two or more groups.

    Examples of Classification Problems:

    Customer Churn.

    Credit Rating (low, high and medium risk)

    Employee attrition.Fraud (classification of a transaction to fraud/non-fraud)

    Outcome of any binomial and multinomial experiment.

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    4/66

    All Rights Reserved, Indian

    Logistic Regression attempts to classify customers into differecategories

    Always Cheerful Always Sad

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    5/66 All Rights Reserved, Indian

    Discrete Choice Models

    Problems involving discrete choices available to the decision mak

    Discrete choice models (in business) examines whichalternaby a customer and why?

    Most discrete choice models estimate the probability that a custa particular alternative from several possible alternatives.

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    6/66 All Rights Reserved, Indian

    Logist

    ic

    Regress

    ion

    Classification Problem

    Discrete Choice Model

    Probability of an event

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    7/66 All Rights Reserved, Indian

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    8/66 All Rights Reserved, Indian

    Logistic Regression - Introduction

    The name logistic regression emerges from logistic function.

    Mathematically, logistic regression attempts to estimate conditional proevent.

    Z

    Z

    e

    eYP

    1)1(

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    9/66 All Rights Reserved, Indian

    Logistic Regression

    Logistic regression models estimate how probability of an ebe affected by one or more explanatory variables.

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    10/66

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    11/66

    All Rights Reserved, Indian

    Logistic Function (Sigmoidal function

    nn

    z

    z

    xxxz

    eez

    ...

    1)(

    22110

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    12/66

    All Rights Reserved, Indian

    Logistic Regression with one Explanatory

    Variable

    )exp(1

    )exp()()|1(

    x

    xxxXYP

    = 0 implies that P(Y | x) is same for each value of x

    > 0 implies that P(Y | x) is increases as the value of x increases

    < 0 implies that P(Y | x) is decreases as the value of x increases

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    13/66

    All Rights Reserved, Indian

    Logit Function

    The Logit function is the logarithmic transformation of function. It is defined as the natural logarithm of odds.

    Logit of a variableis given by:

    110)1

    ln()( xLogit

    odds1

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    14/66

    All Rights Reserved, Indian

    Logistic regression

    More robustError terms need not be normal.

    No requirement for equal variance for error term (homosce

    No requirement for linear relationship between dependent andvariables.

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    15/66

    All Rights Reserved, Indian

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    16/66

    All Rights Reserved, Indian

    Estimation of parameters in Logistic Regres

    Estimation of parameters in logistic regression is carriedMaximum Likelihood Estimation (MLE) technique.

    No closed form solution exists for estimation of regression p

    logistic regression.

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    17/66

    All Rights Reserved, Indian

    Maximum Likelihood Estimator (MLE)

    MLEis a statistical model for estimating model parameters of a f

    For a given data set, the MLEchooses the values of model parmakes the data more likely,than other parameter values.

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    18/66

    All Rights Reserved, Indian

    Likelihood Function

    Likelihood function L() represents the joint probability or observing the data that have been collected.

    MLE chooses that estimator of the set of unknown parame

    maximizes the likelihood function L().

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    19/66

    All Rights Reserved, Indian

    Maximum Likelihood Estimator

    Assume that x1, x2, , xnare some sample observations of a

    f(x,), whereis an unknown parameter.

    The likelihood function is L() = f(x1, x2, , xn, ) whichprobability density function of the sample.

    The value of ,*, which maximizes L() is called the maximuestimator of.

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    20/66

    All Rights Reserved, Indian

    Example: Exponential Distribution

    Let x1, x2, , xn be the sample observation that follows expon

    distribution with parameter . That is:

    The likelihood function is given by (assuming independence):

    xe)f(x,

    n

    i

    ixn

    n

    e

    xfxfxfxL

    1n11 x-x-x-

    21

    eee

    ),(),(),(),(

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    21/66

    All Rights Reserved, Indian

    Log-likelihood function

    Log-likelihood function is given by:

    The optimal , * is given by:

    n

    iixnxLLn

    1

    ln)),((

    xx

    n

    xnxLdd

    n

    i

    i

    n

    i

    i

    1

    0)),((ln(

    1

    *

    1

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    22/66

    All Rights Reserved, Indian

    Likelihood function for Binary Logistic

    Function

    Probability density function for binary logistic regression is give

    ii yi

    yiiyf 1)1()(

    ii y

    i

    n

    i

    y

    inyyyfL

    11

    21 )1(),...,,()(

    n

    i

    ii

    n

    i

    ii xyxyL11

    (1ln()1()(ln))(ln(

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    23/66

    All Rights Reserved, Indian

    Likelihood function for Binary Logis

    Function

    exp(1ln()()(ln 101

    10

    1

    n

    i

    i

    n

    i

    i xxyL

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    24/66

    All Rights Reserved, Indian

    Estimation of LR parameters

    0)exp(1

    )exp()),(ln(

    1 10

    10

    10

    10

    n

    i i

    in

    ii x

    x

    y

    L

    0)exp(1

    )exp()),(ln(

    1 10

    10

    11

    10

    n

    i i

    iin

    iii

    x

    xxyx

    L

    The above system of equations are solved iteratively to

    estimate 0and 1

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    25/66

    All Rights Reserved, Indian

    Limitations of MLE

    Maximum likelihood estimator may not be unique or may not

    Closed form solution may not exist for many cases, one maiterative procedure to estimate the parameter values.

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    26/66

    All Rights Reserved, Indian

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    27/66

    All Rights Reserved, Indian

    Flt Temp Damage

    STS-1 66 No

    STS-2 70 YesSTS-3 69 No

    STS-4 80 No

    STS-5 68 No

    STS-6 67 No

    STS-7 72 No

    STS-8 73 No

    STS-9 70 No

    STS-41B 57 Yes

    STS-41C 63 Yes

    STS-41D 70 Yes

    Flt Temp Damage

    STS-41G 78 No

    STS-51-A 67 No

    STS-51-C 53 Yes

    STS-51-D 67 No

    STS-51-B 75 No

    STS-51-G 70 No

    STS-51-F 81 No

    STS-51-I 76 No

    STS-51-J 79 No

    STS-61-A 75 Yes

    STS-61-B 76 No

    STS-61-C 58 Yes

    Challenger Data

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    28/66

    All Rights Reserved, Indian

    Challenger launch temperature vs

    damage data

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    29/66

    All Rights Reserved, Indian

    Logistic Regression of challenger data

    Let:

    Yi = 0 denote no damage

    Yi = 1 denote damage to the O-ring

    P(Yi = 1) = i and P(Yi = 0) = 1 -i.

    We predict P(Yi = 1|xi), xi = launch temperatu

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    30/66

    All Rights Reserved, Indian

    Logistic Regression using

    SPSS

    Dependent variable:

    In Binary logistic regression, the dependent variable can take on

    In multinomial logistic regression, the dependent variable camore values (but not continuous).

    Covariate:

    All independent (predictor) variables are entered as covariates.

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    31/66

    All Rights Reserved, Indian

    ii

    i X236.0297.151

    ln

    Variables in the Equation

    -.236 .107 4.832 1 .028 15.297 7.329 4.357 1 .037 4

    LaunchTemperatureConstant

    Step1

    a

    B S.E. Wald df Sig . E

    Variable(s) entered on step 1: LaunchTemperature.a.

    i

    i

    X

    X

    ie

    eYP

    236.0297.15

    236.0297.15

    1)1(

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    32/66

    All Rights Reserved, Indian

    Challenger: Probability of failure

    estimate

    i

    i

    X

    X

    ie

    e236.0297.15

    236.0297.15

    1

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    0 20 40 60 80 100 120

    Probability

    Classification table from SPSS

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    33/66

    All Rights Reserved, Indian

    Classification table from SPSS

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    34/66

    All Rights Reserved, Indian

    Accuracy Paradox

    Assume an example of insurance fraud. Past data has revealed that out

    in the past, 950 are true claims and 50 are fraudulent claims.

    The classification table using a logistic regression model is given below:

    Observed Predicted % accuracy

    0 1

    0 900 50 94.73%

    1 5 45 90.00%

    The overall accuracy is 94.5%. Not predicting f

    will give 95% accuracy!

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    35/66

    All Rights Reserved, Indian

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    36/66

    All Rights Reserved, Indian

    ODDS and ODDS RATIO

    ODDS:Ratio of two probability values.

    ODDS Ratio: Ratio of two odds.

    1odds

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    37/66

    All Rights Reserved, Indian

    ODDS RATIO

    Assume that X is an independent variable (covariate). Th

    ratio, OR, is defined as the ratio of the odds for X = 1 tofor X = 0. The odds ratio is given by:

    )0(1)0(

    )1(1)1(

    OR

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    38/66

    All Rights Reserved, Indian

    Interpretation of Beta Coefficient in LR

    x)

    (x)1

    (x)ln( 110

    )(0)1

    (0)ln(0For x 0

    )(1)1

    (1)ln(1For x 10

    ))0(1/()0(

    ))1(1/()1(ln1

    (2)

    (3)

    (1)

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    39/66

    All Rights Reserved, Indian

    Interpretation of LR coefficients

    ratiooddsinChange))1(1()(

    ))1(1()1(

    raoddslninChange))1(1()(

    ))1(1()1(ln

    1

    1

    xx

    xx

    e

    xx

    xx

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    40/66

    All Rights Reserved, Indian

    Odds Ratio for Binary Logistic Regressi

    1

    )0(1)0(

    )1(1)1(

    eOR

    If OR = 2, then the event is twice likely to occur when X = 1 comp

    Odds ratio approximates the relative risk.

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    41/66

    All Rights Reserved, Indian

    Interpretation of LR coefficients

    1 is the change in log-odds ratio for unit change in the variable.

    1is the change in odds ratio by a factor exp(1).

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    42/66

    All Rights Reserved, Indian

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    43/66

    All Rights Reserved, Indian

    Session Outline

    Measuring the fitness of Logistic Regression Model.

    Testing individual regression parameters (Walds test).

    Omnibus test for overall model fitness

    Hosmer-Lemeshow Goodness of fit test.

    R2 in Logistic Regression.

    Confidence Intervals for parameters and probabilities.

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    44/66

    All Rights Reserved, Indian

    Wald Test

    Wald test is used to check the significance of individual

    variables (similar to t-statistic in linear regression).

    Wald test statistic is given by:

    2

    )(

    i

    i

    SE

    W

    W is a chi-square statistic

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    45/66

    All Rights Reserved, Indian

    Wald test hypothesis

    Null Hypothesis H0: i= 0

    Alternative Hypothesis H1: i 0

    Wald Test Challenger Data

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    46/66

    All Rights Reserved, Indian

    Wald Test Challenger Data

    The explanatory

    variable is

    significant, since

    the p value is lessthan 0.05

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    47/66

    All Rights Reserved, Indian

    Wald Test Challenger Data

    For signif

    variables, th

    Exp() w

    contai

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    48/66

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    49/66

    All Rights Reserved, Indian

    Hosmer-Lemeshow Goodness of Fit

    Test for overall fitness of the model for a binary logistic regresto chi-square goodness of fit test).

    The observations are grouped into 10 groups based on theprobabilities.

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    50/66

    All Rights Reserved, Indian

    Hosmer-Lemeshow Test Statistic

    Hosmer-Lemeshow Test Statistic is given by:

    groupkinaveargeThe

    group.kforvaluestheofSumO

    groupeachinnsobservatioofNumbern

    groupsofNumber

    )1(

    )(

    th

    th

    k

    k

    1

    2

    k

    g

    kkkk

    kkk

    g

    n

    nOC

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    51/66

    All Rights Reserved, Indian

    H-L test for Challenger Data

    Hosmer and Lemeshow Test

    9.396 8 .310

    Step

    1

    Chi-square df Sig.

    Since P (=0.310) is more than 0.05, we accept the null hypo

    there is no difference between the predicted and observed f(accept the model)

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    52/66

    All Rights Reserved, Indian

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    53/66

    All Rights Reserved, Indian

    Classification Table

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    54/66

    All Rights Reserved, Indian

    Classification Table

    Prediction

    (Classification)

    Observed

    1 (Positive) 7 0 (Negative) 17

    1 (Positive) 4

    [True Positive] TP

    0

    [False Positive] FP

    0 (Negative) 3 [False Negative] FN 17 [True Negative] TN

    10017

    17

    1.577

    4ySensitivit

    FPTN

    TNySpecificit

    FNTP

    TP

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    55/66

    All Rights Reserved, Indian

    Sensitivity Specificity

    negatifalseofNumberpositivestrueofNumber

    positivestrueofNoySensitivit

    positivfalseofNumbernegativestrueofNumber

    negativestrueofNoySpecificit

    Sensitivity is the probability that the predicted value of y = 1

    given that the observed value is 1.

    Specificity is the probability that the predicted value of y = 0

    given that the observed value is 0.

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    56/66

    All Rights Reserved, Indian

    Receiver Operating Characteristics (ROC) C

    ROC curve plots the true positive ratio (right positive classifica

    the false positive ratio (1- specificity) and compares it wclassification.

    The higher the area under the ROC curve, the better the predic

    Challenger Example: Sensitivity Vs 1-

    Specificity

    (T iti V F l iti )

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    57/66

    All Rights Reserved, Indian

    (True positive Vs False positive)

    Cut off Value Sensitivity Specificity 1-specificity

    0.05 1 0.235 0.765

    0.1 0.857 0.412 0.588

    0.2 0.857 0.529 0.471

    0.3 0.571 0.706 0.294

    0.4 0.571 0.941 0.059

    0.5 0.571 1 0

    0.6 0.571 1 00.7 0.429 1 0

    0.8 0.429 1 0

    0.9 0.143 1 0

    0.95 0 1 0

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    58/66

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    59/66

    Area Under the ROC Curve

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    60/66

    All Rights Reserved, Indian

    Area under the ROC curve is interpreted as the probabil

    model will rank a randomly chosen positive higher thachosen negative.

    If n1 is the number of positives (1s) and n2 is the number o

    (0s), then the area under the ROC curve is the proportion

    all possible combinations of (n1, n2) such that n1 willprobability than n2.

    ROC Curve

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    61/66

    All Rights Reserved, Indian

    General rule for acceptance of the model:

    If the area under ROC is:

    0.5 No discrimination

    0.7 ROC area < 0.8 Acceptable discrimination

    0.8 ROC area < 0.9 Excellent discrimination

    ROC area 0.9 Outstanding discrimination

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    62/66

    Optimal Cut-off probabilities

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    63/66

    All Rights Reserved, Indian

    Optimal Cut-off probabilities

    Using classification plots.

    Youdens Index.

    Cost based optimization.

    Classification Plots

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    64/66

    All Rights Reserved, Indian

    Step number: 1

    Observed Groups and Predicted Probabilities

    4 + 1

    I 1

    I 1

    F I 1

    R 3 + 1 0

    E I 1 0

    Q I 1 0

    U I 1 0

    E 2 + 0 0 1 0 0

    N I 0 0 1 0 0

    C I 0 0 1 0 0

    Y I 0 0 1 0 0 1 + 000 0 0 0 0 0 0 0 0 0 1 1 1

    I 000 0 0 0 0 0 0 0 0 0 1 1 1

    I 000 0 0 0 0 0 0 0 0 0 1 1 1

    I 000 0 0 0 0 0 0 0 0 0 1 1 1

    Predicted ---------+---------+---------+---------+---------+---------+---------+---------+---------+-

    Prob: 0 .1 .2 .3 .4 .5 .6 .7 .8 .9

    Group: 0000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111

    Youdens Index

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    65/66

    All Rights Reserved Indian

    Youden's index is a measures for diagnostic accuracy.

    Youden's index is calculated by deducting 1 from the sum of

    sensitivity and specificity.

    -y(p)specificity(p)SensitivitJ(p)IndexsYouden'

  • 7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

    66/66