uva cs 4501: machine learning lecture 13: logistic …lecture 13: logistic regression dr. yanjun qi...

UVACS4501:MachineLearning

Lecture13:LogisticRegression

Dr.Yanjun Qi

UniversityofVirginiaDepartmentofComputerScience

3/23/18

Dr.YanjunQi/UVACS/s18

Wherearewe?èFivemajorsectionsofthiscourse

q Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels

3/23/18 2

Wherearewe?èThreemajorsectionsforclassification

• We can divide the large variety of classification approaches into roughly three major types

1. Discriminative- directly estimate a decision rule/boundary- e.g., logistic regression, support vector machine, decisionTree

2. Generative:- build a generative statistical model- e.g., naïve bayes classifier, Bayesian networks

3. Instance based classifiers- Use observation directly (no models)- e.g. K nearest neighbors

3/23/18 3

q BayesClassifierq LogisticRegressionq Binarytomulti-classq TrainingLGbyMLE

3/23/18 4

• Treat each feature attribute and the class label as random variables.

• Given a sample x with attributes ( x1, x2, … , xp ):– Goal is to predict its class C.– Specifically, we want to find the value of Ci that maximizes

p( Ci | x1, x2, … , xp ).

• Can we estimate p(Ci |x) = p( Ci | x1, x2, … , xp ) directly from data?

Bayes classifiers

3/23/18

• Treat each feature attribute and the class label as random variables.

• Given a sample x with attributes ( x1, x2, … , xp ):– Goal is to predict its class c.– Specifically, we want to find the class that maximizes

p( c | x1, x2, … , xp ).

• Can we estimate p(Ci |x) = p( Ci | x1, x2, … , xp ) directly from data?

Bayes classifiers

3/23/18

MAPRule

Bayes Classifiers – MAP Rule

Task: Classify a new instance X based on a tuple of attribute values into one of the classesX = X1,X2,…,Xp

cMAP = argmaxcj∈C

P(cj | x1, x2,…, xp )

MAP=MaximumAposteriori Probability

MAPRule

3/23/18

AdaptFromCarols’prob tutorial

PleasereadtheL13ExtraslidesforWHY

ADatasetforclassification

• Data/points/instances/examples/samples/records: [rows]• Features/attributes/dimensions/independent variables/covariates/predictors/regressors: [columns,exceptthelast]• Target/outcome/response/label/dependentvariable:specialcolumntobepredicted [lastcolumn ]

3/23/18 8

Output as Discrete Class Label

C1, C2, …, CL

CDr.YanjunQi/UVACS/s18

Discriminativeargmax

c∈CP(c |X)C = {c1 ,⋅⋅⋅,cL}

Establishing a probabilistic model for classification

– Discriminative model

x = (x1 ,x2 ,⋅⋅⋅,xp)

Discriminative Probabilistic Classifier

1x 2x xp

)|( 1 xcP )|( 2 xcP )|( xLcP

•••

3/23/18AdaptfromProf.KeChenNBslides

argmax

c∈CP(c |X),C = {c1 ,⋅⋅⋅,cL}

ADatasetforclassification

• Data/points/instances/examples/samples/records: [rows]• Features/attributes/dimensions/independent variables/covariates/predictors/regressors: [columns,exceptthelast]• Target/outcome/response/label/dependentvariable:specialcolumntobepredicted [lastcolumn ]

3/23/18 10

Output as Discrete Class Label

C1, C2, …, CL

Generative

argmax

c∈CP(c |X )= argmax

c∈CP(X ,c)= argmax

c∈CP(X |c)P(c)

argmax

c∈CP(c |X)C = {c1 ,⋅⋅⋅,cL}Discriminative

Later!

3/23/18 11

MultivariatelinearregressiontoLogisticRegression

y = β0 +β1x1 +β2x2 + ...+βpxp

Logisticregression forbinaryclassification

ln P( y |x)

1−P( y x)⎡

⎣⎢⎢

⎦⎥⎥= β0 +β1x1 +β2x2 + ...+βpxp

3/23/18 12

Review: ProbabilisticInterpretationofLinearRegression

• Letusassumethatthetargetvariableandtheinputsarerelatedbytheequation:

whereε isanerrortermofunmodeled effectsorrandomnoise

• Nowassumethatε followsaGaussianN(0,σ),thenwehave:

• ByIIDassumption:

iy εθ += x

⎟⎟⎠

⎞⎜⎜⎝

⎛ −−= 2

σπθ )(exp);|( i

iiyxyp x

⎟⎟

⎞⎜⎜

⎛ −−⎟

⎠⎞⎜

⎝⎛== ∑∏ =

σπθθ

)(exp);|()(

3/23/18 13

LogisticRegressionp(y|x)

3/23/18

P(y x)= eβ0+β1x1+β2x2+...+βpxp

1+eβ0+β1x1+β2x2+...+βpxp= 11+e−βT X

ln P( y |x)

1−P( y x)⎡

⎣⎢⎢

⎦⎥⎥= β0 +β1x1 +β2x2 + ...+βpxp

LogisticRegressionmodelsalinearclassificationboundary!

3/23/18

!!y∈{0,1}

ln P( y |x)

1−P( y x)⎡

⎣⎢⎢

⎦⎥⎥= β0 +β1x1 +β2x2 + ...+βpxp

3/23/18

!!y∈{0,1}

(2)p(y|x)

!!ln P( y |x)

1−P( y x)⎡

⎣⎢⎢

⎦⎥⎥=α +β1x1 +β2x2 + ...+βpxp

LogisticRegressionmodelsalinearclassificationboundary!

Thelogisticfunction(1)-- isacommon"S"shapefunction

1.0e.g.Probability ofdisease

e1e)xP(y +

P (Y=1|X)

3/23/18 17

Thelogisticfunction(1)-- isacommon"S"shapefunction

1.0e.g.Probability ofdisease

e1e)xP(y +

P (Y=1|X)

3/23/18 18

LogisticRegression—when?

Logisticregressionmodelsareappropriatefortargetvariablecodedas0/1.

Weonlyobserve“0” and“1” forthetargetvariable—butwethinkofthetargetvariableconceptuallyasaprobabilitythat“1”willoccur.

3/23/18 19

19P(x) 1-p(x)

LogisticRegression—when?

Logisticregressionmodelsareappropriatefortargetvariablecodedas0/1.

Weonlyobserve“0” and“1” forthetargetvariable—butwethinkofthetargetvariableconceptuallyasaprobabilitythat“1”willoccur.

This means we use Bernoulli distribution to model the target variable with its Bernoulli parameter p=p(y=1|x) predefined.

The main interest è predicting the probability that an event occurs (i.e., the probability that p(y=1|x) ).

3/23/18 20

ln( )( )

P y xP y x

x1−⎡

⎣⎢

⎦⎥ = +α β

Thelogit functionView

Logit ofP(y|x)

{P y x e

x( ) =+

α β1

3/23/18 21

ln P( y =1|x)

P( y =0|x)⎡

⎣⎢

⎦⎥ = ln

P( y =1|x)1−P( y =1|x)⎡

⎣⎢

⎦⎥ =α +β1x1 +β2x2 + ...+βpxp

Logit function

DecisionBoundaryè equalstozero

LogisticRegressionAssumptions

• Linearityinthelogit – theregressionequationshouldhavealinearrelationshipwiththelogit formofthetargetvariable

• Thereisnoassumptionaboutthefeaturevariables/targetpredictorsbeinglinearlyrelatedtoeachother.

3/23/18 22

3/23/18 23

Binary Logistic RegressionIn summary that the logistic regression tells us two things at once.• Transformed, the “log odds” (logit) are linear.

• Logistic Distribution

ln[p/(1-p)]

P (Y=1|x)

x3/23/18 24

This means we use Bernoulli distribution to model the target variable with its Bernoulli parameter p=p(y=1|x)predefined.

P(y=1|x) 1-p(x)

Odds=p/(1-p)

BinaryèMultinoulliLogisticRegressionModel

Pr(G = k |X = x)= exp(βk0 +βkT x)

1+ exp(βl0 +βlT x)l=1

∑, k =1,…,K −1

Pr(G = K |X = x)= 11+ exp(βl0 +βlT x)

Directly modelstheposteriorprobabilitiesastheoutputofregression

Notethattheclassboundariesarelinear

x isp-dimensional inputvector

isap-dimensionalvectorforeachclassk

Totalnumber ofparametersis(K-1)(p+1)

3/23/18

BinaryèMultinoulliLogisticRegressionModel

Notethattheclassboundariesarelinear3/23/18

3/23/18 27

Parameter EstimationforLGèMLEfrom thedata

• Review:– Linear regression trainingè Leastsquares– Gaussian explaination ofLinear Regression trainingèMLE

• Logistic regression Training:– MaximumLikelihood Estimationbased– Classic algorithms using iterative Newton

3/23/18 28

PleasereadtheL13ExtraslidesforHOW

Review:DefiningLikelihoodforbasicBernoulli

• Likelihood=p(data|parameter)

!!f (xi |p)= pxi (1− p)1−xi

function of x_i

!!L(p)=∏i=1n pxi (1− p)1−xi = px(1− p)n−x

function of p

LIKELIHOOD:

è e.g.,fornindependenttossesofcoins,withunknownp

Dr.YanjunQi/UVACS

Observeddataèxheads-upfromntrials

x = xi

MLEforLogisticRegressionTraining(extra)

Let’sfitthelogisticregressionmodelforK=2, i.e.,number ofclassesis2

l(β)= {logPr(Y = yi |X = xi )}i=1

= yi log(Pr(Y =1|X = xi ))+(1− yi )log(Pr(Y =0|X = xi ))i=1

= ( yi logexp(βT xi )

1+exp(βT xi ))+(1− yi )log

11+exp(βT xi )

= ( yiβT xi − log(1+exp(βT xi )))i=1

Trainingset:(xi,yi),i=1,…,N

Wewanttomaximize thelog-likelihood inordertoestimate\beta

xi are(p+1)-dimensional inputvectorwithleadingentry1\betaisa(p+1)-dimensional vector

pβ( y |x)y(1− pβ( y |x))1− y

ForBernoullidistribution

3/23/18

!!l(β)= {logPr(Y = yi |X = xi )}

Logistic Regression

classification

Log-odds = linear function of X’s

EPE, MLE

Iterative (Newton) method

Logistic weights

Representation

Score Function

Search/Optimization

Models, Parameters

P(c =1 x) = eα+βx

1+ eα+βx3/23/18 32

References

q Prof.Tan,Steinbach,Kumar’s“IntroductiontoDataMining”slide

q Prof.AndrewMoore’sslidesq Prof.EricXing’sslidesq Prof.Ke ChenNBslidesq Hastie,Trevor,etal.Theelementsofstatisticallearning.Vol.2.No.1.NewYork:Springer,2009.

3/23/18 33

uva cs 4501: machine learning lecture 13: logistic …lecture 13: logistic regression dr. yanjun qi...

Documents

skylark: easy cloud computing - xen · skylark: easy cloud...

qi certification - wireless power consortium€¦ · •qi...

lecture 17: neural networks and deep learninglecture 17:...

yanjun%qi%/%uva%cs%4501d01d6501d07% … · 2020-01-12 ·...

1$ dr.$yanjun$qi$/$uva$cs$6316$/$f15$ welcome ·...

dissertation yanjun guo 11-05

logistic regression for distribution modeling - clas...

yang gallery singapore & beijing - lv yanjun solo exhibition...

protein complex identification by supervised graph...

uva cs 6316/4501 – fall 2016 machine learning lecture 21:...

uva cs 6316 / cs 4501 – fall 2015 : machine learning...

uva cs 6316/4501 – fall 2016 machine learning lecture 19...

dr. yanjun qi / uva cs 6316 / f16 uva cs 6316/4501 – fall...

uva cs 4501: machine learning lecture 12: probability...

1$ dr.$yanjun$qi$/$uva$cs$6316$/$f15$ welcome ·...

linux security chapter 21 (section 1-7) by yanjun zuo

uva$cs$6316$$ –$fall$2015$graduate:$$ machine$learning...

malaysia logistic masterplan _ logistic

uva cs 6316: machine learning lecture 10: bias-variance...

stata for logistic regression - people.umass.edu for...