curves_coefficients_cutoffs@measurments_sensitivity_specificity

7/25/2019 curves_coefficients_cutoffs@measurments_sensitivity_specificity

1/66

All Rights Reserved, Indian


2/66


Session Outline

Introduction to classification problems and discrete choice models.

Introduction to Logistics Regression.

Logistic function and Logit function.

Maximum Likelihood Estimator (MLE) for estimation of LR param

Examples: Challenger Shuttle, German Bank Credit Rating.


3/66


Classification Problems

Classification is an important category of problems in which the decis

like to classify the customers into two or more groups.

Examples of Classification Problems:

Customer Churn.

Credit Rating (low, high and medium risk)

Employee attrition.Fraud (classification of a transaction to fraud/non-fraud)

Outcome of any binomial and multinomial experiment.


4/66


Logistic Regression attempts to classify customers into differecategories

Always Cheerful Always Sad


5/66 All Rights Reserved, Indian

Discrete Choice Models

Problems involving discrete choices available to the decision mak

Discrete choice models (in business) examines whichalternaby a customer and why?

Most discrete choice models estimate the probability that a custa particular alternative from several possible alternatives.



Logist

ic

Regress

ion

Classification Problem

Discrete Choice Model

Probability of an event



Logistic Regression - Introduction

The name logistic regression emerges from logistic function.

Mathematically, logistic regression attempts to estimate conditional proevent.

Z

Z

e

eYP

1)1(



Logistic Regression

Logistic regression models estimate how probability of an ebe affected by one or more explanatory variables.


10/66


11/66


Logistic Function (Sigmoidal function

nn

z

z

xxxz

eez

...

1)(

22110


12/66


Logistic Regression with one Explanatory

Variable

)exp(1

)exp()()|1(

x

xxxXYP

= 0 implies that P(Y | x) is same for each value of x

> 0 implies that P(Y | x) is increases as the value of x increases

< 0 implies that P(Y | x) is decreases as the value of x increases


13/66


Logit Function

The Logit function is the logarithmic transformation of function. It is defined as the natural logarithm of odds.

Logit of a variableis given by:

110)1

ln()( xLogit

odds1


14/66


Logistic regression

More robustError terms need not be normal.

No requirement for equal variance for error term (homosce

No requirement for linear relationship between dependent andvariables.


15/66



16/66


Estimation of parameters in Logistic Regres

Estimation of parameters in logistic regression is carriedMaximum Likelihood Estimation (MLE) technique.

No closed form solution exists for estimation of regression p

logistic regression.


17/66


Maximum Likelihood Estimator (MLE)

MLEis a statistical model for estimating model parameters of a f

For a given data set, the MLEchooses the values of model parmakes the data more likely,than other parameter values.


18/66


Likelihood Function

Likelihood function L() represents the joint probability or observing the data that have been collected.

MLE chooses that estimator of the set of unknown parame

maximizes the likelihood function L().


19/66


Maximum Likelihood Estimator

Assume that x1, x2, , xnare some sample observations of a

f(x,), whereis an unknown parameter.

The likelihood function is L() = f(x1, x2, , xn, ) whichprobability density function of the sample.

The value of ,*, which maximizes L() is called the maximuestimator of.


20/66


Example: Exponential Distribution

Let x1, x2, , xn be the sample observation that follows expon

distribution with parameter . That is:

The likelihood function is given by (assuming independence):

xe)f(x,

n

i

ixn

n

e

xfxfxfxL

1n11 x-x-x-

21

eee

),(),(),(),(


21/66


Log-likelihood function

Log-likelihood function is given by:

The optimal , * is given by:

n

iixnxLLn

1

ln)),((

xx

n

xnxLdd

n

i

i

n

i

i

1

0)),((ln(

1

*

1


22/66


Likelihood function for Binary Logistic

Function

Probability density function for binary logistic regression is give

ii yi

yiiyf 1)1()(

ii y

i

n

i

y

inyyyfL

11

21 )1(),...,,()(

n

i

ii

n

i

ii xyxyL11

(1ln()1()(ln))(ln(


23/66


Likelihood function for Binary Logis

Function

exp(1ln()()(ln 101

10

1

n

i

i

n

i

i xxyL


24/66


Estimation of LR parameters

0)exp(1

)exp()),(ln(

1 10

10

10

10

n

i i

in

ii x

x

y

L

0)exp(1

)exp()),(ln(

1 10

10

11

10

n

i i

iin

iii

x

xxyx

L

The above system of equations are solved iteratively to

estimate 0and 1


25/66


Limitations of MLE

Maximum likelihood estimator may not be unique or may not

Closed form solution may not exist for many cases, one maiterative procedure to estimate the parameter values.


26/66



27/66


Flt Temp Damage

STS-1 66 No

STS-2 70 YesSTS-3 69 No

STS-4 80 No

STS-5 68 No

STS-6 67 No

STS-7 72 No

STS-8 73 No

STS-9 70 No

STS-41B 57 Yes

STS-41C 63 Yes

STS-41D 70 Yes

Flt Temp Damage

STS-41G 78 No

STS-51-A 67 No

STS-51-C 53 Yes

STS-51-D 67 No

STS-51-B 75 No

STS-51-G 70 No

STS-51-F 81 No

STS-51-I 76 No

STS-51-J 79 No

STS-61-A 75 Yes

STS-61-B 76 No

STS-61-C 58 Yes

Challenger Data


28/66


Challenger launch temperature vs

damage data


29/66


Logistic Regression of challenger data

Let:

Yi = 0 denote no damage

Yi = 1 denote damage to the O-ring

P(Yi = 1) = i and P(Yi = 0) = 1 -i.

We predict P(Yi = 1|xi), xi = launch temperatu


30/66


Logistic Regression using

SPSS

Dependent variable:

In Binary logistic regression, the dependent variable can take on

In multinomial logistic regression, the dependent variable camore values (but not continuous).

Covariate:

All independent (predictor) variables are entered as covariates.


31/66


ii

i X236.0297.151

ln

Variables in the Equation

-.236 .107 4.832 1 .028 15.297 7.329 4.357 1 .037 4

LaunchTemperatureConstant

Step1

a

B S.E. Wald df Sig . E

Variable(s) entered on step 1: LaunchTemperature.a.

i

i

X

X

ie

eYP

236.0297.15

236.0297.15

1)1(


32/66


Challenger: Probability of failure

estimate

i

i

X

X

ie

e236.0297.15

236.0297.15

1

0

0.2

0.4

0.6

0.8

1

1.2

0 20 40 60 80 100 120

Probability

Classification table from SPSS


33/66


Classification table from SPSS


34/66


Accuracy Paradox

Assume an example of insurance fraud. Past data has revealed that out

in the past, 950 are true claims and 50 are fraudulent claims.

The classification table using a logistic regression model is given below:

Observed Predicted % accuracy

0 1

0 900 50 94.73%

1 5 45 90.00%

The overall accuracy is 94.5%. Not predicting f

will give 95% accuracy!


35/66



36/66


ODDS and ODDS RATIO

ODDS:Ratio of two probability values.

ODDS Ratio: Ratio of two odds.

1odds


37/66


ODDS RATIO

Assume that X is an independent variable (covariate). Th

ratio, OR, is defined as the ratio of the odds for X = 1 tofor X = 0. The odds ratio is given by:

)0(1)0(

)1(1)1(

OR


38/66


Interpretation of Beta Coefficient in LR

x)

(x)1

(x)ln( 110

)(0)1

(0)ln(0For x 0

)(1)1

(1)ln(1For x 10

))0(1/()0(

))1(1/()1(ln1

(2)

(3)

(1)


39/66


Interpretation of LR coefficients

ratiooddsinChange))1(1()(

))1(1()1(

raoddslninChange))1(1()(

))1(1()1(ln

1

1

xx

xx

e

xx

xx


40/66


Odds Ratio for Binary Logistic Regressi

1

)0(1)0(

)1(1)1(

eOR

If OR = 2, then the event is twice likely to occur when X = 1 comp

Odds ratio approximates the relative risk.


41/66


Interpretation of LR coefficients

1 is the change in log-odds ratio for unit change in the variable.

1is the change in odds ratio by a factor exp(1).


42/66



43/66


Session Outline

Measuring the fitness of Logistic Regression Model.

Testing individual regression parameters (Walds test).

Omnibus test for overall model fitness

Hosmer-Lemeshow Goodness of fit test.

R2 in Logistic Regression.

Confidence Intervals for parameters and probabilities.


44/66


Wald Test

Wald test is used to check the significance of individual

variables (similar to t-statistic in linear regression).

Wald test statistic is given by:

2

)(

i

i

SE

W

W is a chi-square statistic


45/66


Wald test hypothesis

Null Hypothesis H0: i= 0

Alternative Hypothesis H1: i 0

Wald Test Challenger Data


46/66



The explanatory

variable is

significant, since

the p value is lessthan 0.05


47/66



For signif

variables, th

Exp() w

contai


48/66


49/66


Hosmer-Lemeshow Goodness of Fit

Test for overall fitness of the model for a binary logistic regresto chi-square goodness of fit test).

The observations are grouped into 10 groups based on theprobabilities.


50/66


Hosmer-Lemeshow Test Statistic

Hosmer-Lemeshow Test Statistic is given by:

groupkinaveargeThe

group.kforvaluestheofSumO

groupeachinnsobservatioofNumbern

groupsofNumber

)1(

)(

th

th

k

k

1

2

k

g

kkkk

kkk

g

n

nOC


51/66


H-L test for Challenger Data

Hosmer and Lemeshow Test

9.396 8 .310

Step

1

Chi-square df Sig.

Since P (=0.310) is more than 0.05, we accept the null hypo

there is no difference between the predicted and observed f(accept the model)


52/66



53/66


Classification Table


54/66


Classification Table

Prediction

(Classification)

Observed

1 (Positive) 7 0 (Negative) 17

1 (Positive) 4

[True Positive] TP

0

[False Positive] FP

0 (Negative) 3 [False Negative] FN 17 [True Negative] TN

10017

17

1.577

4ySensitivit

FPTN

TNySpecificit

FNTP

TP


55/66


Sensitivity Specificity

negatifalseofNumberpositivestrueofNumber

positivestrueofNoySensitivit

positivfalseofNumbernegativestrueofNumber

negativestrueofNoySpecificit

Sensitivity is the probability that the predicted value of y = 1

given that the observed value is 1.

Specificity is the probability that the predicted value of y = 0

given that the observed value is 0.


56/66


Receiver Operating Characteristics (ROC) C

ROC curve plots the true positive ratio (right positive classifica

the false positive ratio (1- specificity) and compares it wclassification.

The higher the area under the ROC curve, the better the predic

Challenger Example: Sensitivity Vs 1-

Specificity

(T iti V F l iti )


57/66


(True positive Vs False positive)

Cut off Value Sensitivity Specificity 1-specificity

0.05 1 0.235 0.765

0.1 0.857 0.412 0.588

0.2 0.857 0.529 0.471

0.3 0.571 0.706 0.294

0.4 0.571 0.941 0.059

0.5 0.571 1 0

0.6 0.571 1 00.7 0.429 1 0

0.8 0.429 1 0

0.9 0.143 1 0

0.95 0 1 0


58/66


59/66

Area Under the ROC Curve


60/66


Area under the ROC curve is interpreted as the probabil

model will rank a randomly chosen positive higher thachosen negative.

If n1 is the number of positives (1s) and n2 is the number o

(0s), then the area under the ROC curve is the proportion

all possible combinations of (n1, n2) such that n1 willprobability than n2.

ROC Curve


61/66


General rule for acceptance of the model:

If the area under ROC is:

0.5 No discrimination

0.7 ROC area < 0.8 Acceptable discrimination

0.8 ROC area < 0.9 Excellent discrimination

ROC area 0.9 Outstanding discrimination


62/66

Optimal Cut-off probabilities


63/66


Optimal Cut-off probabilities

Using classification plots.

Youdens Index.

Cost based optimization.

Classification Plots


64/66


Step number: 1

Observed Groups and Predicted Probabilities

4 + 1

I 1

I 1

F I 1

R 3 + 1 0

E I 1 0

Q I 1 0

U I 1 0

E 2 + 0 0 1 0 0

N I 0 0 1 0 0

C I 0 0 1 0 0

Y I 0 0 1 0 0 1 + 000 0 0 0 0 0 0 0 0 0 1 1 1

I 000 0 0 0 0 0 0 0 0 0 1 1 1

I 000 0 0 0 0 0 0 0 0 0 1 1 1

I 000 0 0 0 0 0 0 0 0 0 1 1 1

Predicted ---------+---------+---------+---------+---------+---------+---------+---------+---------+-

Prob: 0 .1 .2 .3 .4 .5 .6 .7 .8 .9

Group: 0000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111

Youdens Index


65/66

All Rights Reserved Indian

Youden's index is a measures for diagnostic accuracy.

Youden's index is calculated by deducting 1 from the sum of

sensitivity and specificity.

-y(p)specificity(p)SensitivitJ(p)IndexsYouden'


66/66

curves_coefficients_cutoffs@measurments_sensitivity_specificity

Documents