uva cs 6316/4501 – fall 2016 machine learning lecture 14: … › yanjun › teach › 2016f ›...

47
UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: LogisAc Regression / GeneraAve vs. DiscriminaAve Dr. Yanjun Qi University of Virginia Department of Computer Science 11/2/16 Dr. Yanjun Qi / UVA CS 6316-450104 / f16 1

Upload: others

Post on 06-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

UVACS6316/4501–Fall2016

MachineLearning

Lecture14:LogisAcRegression/GeneraAvevs.DiscriminaAve

Dr.YanjunQi

UniversityofVirginia

DepartmentofComputerScience

11/2/16

Dr.YanjunQi/UVACS6316-450104/f16

1

Page 2: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Wherearewe?èFivemajorsecJonsofthiscourse

q Regression(supervised)q ClassificaJon(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels

11/2/16 2

Dr.YanjunQi/UVACS6316-450104/f16

Page 3: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Wherearewe?èThreemajorsecJonsforclassificaJon

•  We can divide the large variety of classification approaches into roughly three major types

1. Discriminative - directly estimate a decision rule/boundary - e.g., logistic regression, support vector machine, decisionTree 2. Generative: - build a generative statistical model - e.g., naïve bayes classifier, Bayesian networks 3. Instance based classifiers - Use observation directly (no models) - e.g. K nearest neighbors

11/2/16 3

Dr.YanjunQi/UVACS6316-450104/f16

Page 4: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

ADatasetforclassificaJon

•  Data/points/instances/examples/samples/records:[rows]•  Features/a0ributes/dimensions/independentvariables/covariates/predictors/regressors:[columns,exceptthelast]•  Target/outcome/response/label/dependentvariable:specialcolumntobepredicted[lastcolumn]

11/2/16 4

Output as Discrete Class Label

C1, C2, …, CL

C

C

GeneraJve

Dr.YanjunQi/UVACS6316-450104/f16

argmaxC

P(C | X) = argmaxC

P(X,C) = argmaxC

P(X |C)P(C)

argmax

CP(C |X)C = c1 ,⋅⋅⋅,cLDiscriminaJve

Page 5: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Establishing a probabilistic model for classification (cont.)

–  (1) Generative model

Generative Probabilistic Model

for Class 1

)|( 1cP x

1x 2x xp•••

Generative Probabilistic Model

for Class 2

)|( 2cP x

1x 2x xp•••

Generative Probabilistic Model

for Class L

)|( LcP x

1x 2x xp•••

•••

x = (x1, x2, ⋅ ⋅ ⋅, xp )11/2/16 AdaptfromProf.KeChenNBslides

argmaxC

P(C | X) = argmaxC

P(X,C)

= argmaxC

P(X |C)P(C)

5

Dr.YanjunQi/UVACS6316-450104/f16

Page 6: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Establishing a probabilistic model for classification

–  (2) Discriminative model

P(C |X) C = c1,⋅ ⋅ ⋅,cL, X = (X1, ⋅ ⋅ ⋅,Xn )

x = (x1, x2, ⋅ ⋅ ⋅, xn )

Discriminative

Probabilistic Classifier

1x 2x nx

)|( 1 xcP )|( 2 xcP )|( xLcP

•••

•••

11/2/16 AdaptfromProf.KeChenNBslides6

Dr.YanjunQi/UVACS6316-450104/f16

Page 7: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Today:GeneraAvevs.DiscriminaAve

ü WhyBayesClassificaJon–MAPRule?

§  EmpiricalPredicJonError§  0-1LossfuncJonforBayesClassifier

ü  LogisJcregressionü  GeneraJvevs.DiscriminaJve

11/2/16 7

Dr.YanjunQi/UVACS6316-450104/f16

Page 8: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Bayes Classifiers – MAP Rule

Task: Classify a new instance X based on a tuple of attribute values into one of the classes X = X1,X2,…,Xp

cMAP = argmaxcj∈C

P(cj | x1, x2,…, xp )

8

MAP=MaximumAposterioriProbability

WHY?

11/2/16

Dr.YanjunQi/UVACS6316-450104/f16

AdaptFromCarols’probtutorial

Page 9: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

9

0-1 LOSS for Classification

•  Procedure for categorical output variable C •  Frequently, 0-1 loss function used: L(k, ℓ)

•  L(k, ℓ) is the price paid for misclassifying an element from class Ck as belonging to class Cℓ è L*L matrix

11/2/16

Dr.YanjunQi/UVACS6316-450104/f16

Page 10: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

10

Expected prediction error (EPE)

•  Expected prediction error (EPE), with expectation taken w.r.t. the joint distribution Pr(C,X) – Pr(C,X )=Pr(C |X )Pr(X )

EPE( f )= EX ,C(L(C , f (X ))

=EX L[C kk=1

L

∑ , f (X )]Pr(C k |X ) Consider sample

population distribution

11/2/16

Dr.YanjunQi/UVACS6316-450104/f16

Page 11: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

11/2/16

Dr.YanjunQi/UVACS6316-450104/f16

11

Page 12: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

12

Expected prediction error (EPE)

•  Pointwise minimization suffices

•  è simply

!! EPE( f )= EX ,C(L(C , f (X ))=EX L[C k

k=1

K

∑ , f (X )]Pr(C k |X )

!! f̂ (X )= argmin g∈C

L(C kk=1

K

∑ ,g)Pr(C k |X = x)

!!

f̂ (X )=C k !if!Pr(C k |X = x)=max

g∈CPr(g|X = x)

Bayes Classifier

Consider sample

population distribution

11/2/16

Dr.YanjunQi/UVACS6316-450104/f16

Page 13: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

SUMMARY: WHEN EPE USES DIFFERENT LOSS

Loss Function Estimator

L2

L1

0-1

(Bayes classifier / MAP)

13 11/2/16

Dr.YanjunQi/UVACS6316-450104/f16

Page 14: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Today:GeneraAvevs.DiscriminaAve

ü WhyBayesClassificaJon–MAPRule?

§  EmpiricalPredicJonError§  0-1LossfuncJonforBayesClassifier

ü  LogisJcregressionü  GeneraJvevs.DiscriminaJve

11/2/16 14

Dr.YanjunQi/UVACS6316-450104/f16

Page 15: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

MulJvariatelinearregressiontoLogisJcRegression

Dependent IndependentvariablesPredicted Predictorvariables Responsevariable ExplanatoryvariablesOutcomevariable Covariables

xβ ... xβ xβα y ii2211 ++++=

LogisJcregressionforbinaryclassificaJon

!!ln P( y |x)

1−P( y x)⎡

⎣⎢⎢

⎦⎥⎥=α +β1x1 +β2x2 + ...+βpxp

11/2/16 15

Dr.YanjunQi/UVACS6316-450104/f16

Page 16: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

11/2/16

Dr.YanjunQi/UVACS6316-450104/f16

16

!!y∈{0,1}

(1)Lineardecisionboundary

(2)p(y|x)

!!ln P( y |x)

1−P( y x)⎡

⎣⎢⎢

⎦⎥⎥=α +β1x1 +β2x2 + ...+βpxp

Page 17: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

11/2/16

Dr.YanjunQi/UVACS6316-450104/f16

17

!!y∈{0,1}

(1)Lineardecisionboundary

(2)p(y|x)

!!ln P( y |x)

1−P( y x)⎡

⎣⎢⎢

⎦⎥⎥=α +β1x1 +β2x2 + ...+βpxp

Page 18: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

ThelogisJcfuncJon(1)--isacommon"S"shapefunc

0.0

0.2

0.4

0.6

0.8

1.0e.g.Probabilityofdisease

x

βxα

βxα

e1e)xP(y +

+

+=

P (Y=1|X)

11/2/16 18

Dr.YanjunQi/UVACS6316-450104/f16

Page 19: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

LogisJcRegression—when?

LogisJcregressionmodelsareappropriatefortargetvariablecodedas0/1.

Weonlyobserve“0”and“1”forthetargetvariable—butwethinkofthetargetvariableconceptuallyasaprobabilitythat“1”willoccur.

11/2/16 19

Dr.YanjunQi/UVACS6316-450104/f16

Page 20: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

LogisJcRegression—when?

LogisJcregressionmodelsareappropriatefortargetvariablecodedas0/1.

Weonlyobserve“0”and“1”forthetargetvariable—butwethinkofthetargetvariableconceptuallyasaprobabilitythat“1”willoccur.

This means we use Bernoulli distribution to model the target variable with its Bernoulli parameter p=p(y=1|x) predefined. The main interest è predicting the probability that an event occurs (i.e., the probability that p(y=1|x) ).

11/2/16 20

Dr.YanjunQi/UVACS6316-450104/f16

Page 21: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

0.0

0.2

0.4

0.6

0.8

1.0e.g.Probabilityofdisease

x

P (y=1|X)

11/2/16 21

DiscriminaJve LogisJcregressionmodelsforbinarytargetvariablecoded0/1.

P( y = 1 x) = eα+βx

1+ eα+βx

logisJcfuncJon

LogitfuncJon

Dr.YanjunQi/UVACS6316-450104/f16

Page 22: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

0.0

0.2

0.4

0.6

0.8

1.0e.g.Probabilityofdisease

x

P (y=1|X)

11/2/16 22

DiscriminaJve LogisJcregressionmodelsforbinarytargetvariablecoded0/1.

P( y = 1 x) = eα+βx

1+ eα+βx

ln P( y =1|x)

P( y =0|x)⎡

⎣⎢

⎦⎥ = ln

P( y =1|x)1−P( y =1|x)⎡

⎣⎢

⎦⎥ =α +β1x1 +β2x2 + ...+βpxp

logisJcfuncJon

LogitfuncJon

DecisionBoundaryèequalstozero

Dr.YanjunQi/UVACS6316-450104/f16

Page 23: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

ln( )( )

P y xP y x

x1−⎡

⎣⎢

⎦⎥ = +α β

ThelogisJcfuncJon(2)

LogitofP(y|x)

{P y x e

e

x

x( ) =+

+

+

α β

α β1

11/2/16 23

Dr.YanjunQi/UVACS6316-450104/f16

Page 24: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

ThelogisJcfuncJon(3)

•  Advantagesofthelogit– SimpletransformaJonofP(y|x)– LinearrelaJonshipwithx– CanbeconJnuous(Logitbetween-infto+infinity)– DirectlyrelatedtothenoJonoflogoddsoftargetevent

βxαP-1

P ln +=⎟⎠⎞⎜

⎝⎛ e

P-1P βxα+=

11/2/16 24

Dr.YanjunQi/UVACS6316-450104/f16

z = log p1− p

⎛⎝⎜

⎞⎠⎟

Page 25: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

LogisJcRegressionAssumpJons

•  Linearityinthelogit–theregressionequaJonshouldhavealinearrelaJonshipwiththelogitformofthetargetvariable

•  ThereisnoassumpJonaboutthefeaturevariables/targetpredictorsbeinglinearlyrelatedtoeachother.

11/2/16 25

Dr.YanjunQi/UVACS6316-450104/f16

Page 26: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Binary Logistic Regression In summary that the logistic regression tells us two things at once. •  Transformed, the “log odds” (logit) are linear.

•  Logistic Distribution

ln[p/(1-p)]

P (Y=1|x)

x

x 11/2/16 26

Dr.YanjunQi/UVACS6316-450104/f16

This means we use Bernoulli distribution to model the target variable with its Bernoulli parameter p=p(y=1|x) predefined.

p 1-p

Odds=p/(1-p)

Page 27: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

27

BinaryèMulJnomialLogisJcRegressionModel

=

=

++===

−=++

+===

1

10

1

10

0

)exp(1

1)|Pr(

1,,1,)exp(1

)exp()|Pr(

K

l

Tll

K

l

Tll

Tkk

xxXKG

Kkx

xxXkG

ββ

ββ

ββ…

DirectlymodelstheposteriorprobabiliJesastheoutputofregression

Notethattheclassboundariesarelinear

xisp-dimensionalinputvector\betakisap-dimensionalvectorforeachkTotalnumberofparametersis(K-1)(p+1)

11/2/16

Dr.YanjunQi/UVACS6316-450104/f16

Page 28: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

28

BinaryèMulJnomialLogisJcRegressionModel

=

=

++===

−=++

+===

1

10

1

10

0

)exp(1

1)|Pr(

1,,1,)exp(1

)exp()|Pr(

K

l

Tll

K

l

Tll

Tkk

xxXKG

Kkx

xxXkG

ββ

ββ

ββ…

DirectlymodelstheposteriorprobabiliJesastheoutputofregression

Notethattheclassboundariesarelinear

xisp-dimensionalinputvector\betakisap-dimensionalvectorforeachkTotalnumberofparametersis(K-1)(p+1)

11/2/16

Dr.YanjunQi/UVACS6316-450104/f16

Page 29: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Today:GeneraAvevs.DiscriminaAve

ü WhyBayesClassificaJon–MAPRule?

§  EmpiricalPredicJonError§  0-1LossfuncJonforBayesClassifier

ü  LogisJcregression§  ParameterEsJmaJonforLR

ü  GeneraJvevs.DiscriminaJve

11/2/16 29

Dr.YanjunQi/UVACS6316-450104/f16

Page 30: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

ParameterEsJmaJonforLRèMLEfromthedata

•  RECAP:LinearregressionèLeastsquares

•  LogisJcregression:èMaximumlikelihoodesJmaJon

11/2/16 30

Dr.YanjunQi/UVACS6316-450104/f16

Page 31: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

31

MLEforLogisJcRegressionTraining

Let’sfitthelogisJcregressionmodelforK=2,i.e.,numberofclassesis2

!!

l(β)= {logPr(Y = yi |X = xi )}i=1

N

= yi log(Pr(Y =1|X = xi ))+(1− yi )log(Pr(Y =0|X = xi ))i=1

N

= ( yi logexp(βT xi )

1+exp(βT xi ))+(1− yi )log

11+exp(βT xi )

)i=1

N

= ( yiβT xi − log(1+exp(βT xi )))i=1

N

Trainingset:(xi,yi),i=1,…,N

(condiJonal)Log-likelihood:

Wewanttomaximizethelog-likelihoodinordertoesJmate\beta

xiare(p+1)-dimensionalinputvectorwithleadingentry1\betaisa(p+1)-dimensionalvector

p(y | x)y (1− p)1−yForBernoullidistribuJon

11/2/16

Dr.YanjunQi/UVACS6316-450104/f16

How?

Page 32: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

11/2/16

Dr.YanjunQi/UVACS6316-450104/f16

32

!!l(β)= {logPr(Y = yi |X = xi )}

i=1

N

Page 33: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Newton-RaphsonforLR(opJonal)

0))exp(1)exp(()(

1=

+−=

∂∂ ∑

=

N

iiT

T

i xxxyl

ββ

ββ

(p+1)Non-linearequaJonstosolvefor(p+1)unknowns

SolvebyNewton-Raphsonmethod:

where, ( ∂2l(β)

∂β∂βT ) = - xixiT ( exp(β

T xi )1+ exp(βT xi )

)( 11+ exp(βT xi )

)i=1

N

βnew ← β old −[( ∂2l(β)

∂β∂βT )]-1 ∂l(β)∂β

,

p(xi;β) 1-p(xi;β)11/2/16

Dr.YanjunQi/UVACS6316-450104/f16

33

minimizesaquadraJcapproximaJontothefuncJonwearereallyinterestedin.

Page 34: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Newton-RaphsonforLR…

),()( 1 pyXWXX TToldnew −+← −ββ

∂l(β)∂β

= (yi −exp(βT x)1+ exp(βT x)

)xii=1

N

∑ = XT (y− p)

( ∂2l(β)

∂β∂βT ) = −XTWX

So,NRrulebecomes:

,,

1

2

1

)1(

2

1

−−+−−

⎥⎥⎥⎥

⎢⎢⎢⎢

=

⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

=

byNNpbyNTN

T

T

y

yy

y

x

xx

X!!

,

))exp(1/()exp(

))exp(1/()exp())exp(1/()exp(

1

22

11

−−⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

+

++

=

byNNT

NT

TT

TT

xx

xxxx

p

ββ

ββββ

!

)))exp((1

11)())exp((1

)exp((i

Ti

Ti

T

xxx

βββ

+−

+

));(1();( ofmatrix diagonal :

);( ofmatrix 1 :

ofmatrix 1 : ofmatrix 1 :

oldi

oldi

oldi

i

i

xpxpN NWxp Np

y Nyx) (pNX

βββ

−×

×

×+×

11/2/16

Dr.YanjunQi/UVACS6316-450104/f16

34

Page 35: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Newton-RaphsonforLR…

35

•  Newton-Raphson– 

–  Adjustedresponse

–  IteraJvelyreweightedleastsquares(IRLS)

WzXWXXpyWXWXWXX

pyXWXX

TT

oldTT

TToldnew

1

11

1

)())(()(

)()(

−−

=−+=

−+=β

ββ

)(1 pyWXz old −+= −β

)()(minarg

)()(minarg

1 pyWpy

XzWXzT

TTTnew

−−←

−−←

β

ββββ

ReexpressingNewtonstepasweightedleastsquarestep

11/2/16

Dr.YanjunQi/UVACS6316-450104/f16

Page 36: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Logistic Regression

classification

Log-odds = linear function of X’s

EPE, with conditional Log-likelihood

Iterative (Newton) method

Logistic weights

Task

Representation

Score Function

Search/Optimization

Models, Parameters

P(c =1 x) = eα+βx

1+ eα+βx11/2/16 36

Dr.YanjunQi/UVACS6316-450104/f16

Page 37: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Today:GeneraAvevs.DiscriminaAve

ü WhyBayesClassificaJon–MAPRule?

§  EmpiricalPredicJonError§  0-1LossfuncJonforBayesClassifier

ü  LogisJcregressionü  GeneraJvevs.DiscriminaJve

11/2/16 37

Dr.YanjunQi/UVACS6316-450104/f16

Page 38: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Discriminative vs. Generative GeneraJveapproach-ModelthejointdistribuJonp(X,C)using p(X|C=ck)andp(C=ck)DiscriminaJveapproach-ModelthecondiJonaldistribuJonp(c|X)directly

Classprior

e.g.,

Page 39: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Discriminative vs. Generative

Height

Gaussian

Pr

LogisJcRegression

Page 40: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

LDAvs.LogisJcRegression

11/2/16

Dr.YanjunQi/UVACS6316-450104/f16

40

Kp+

Page 41: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Discriminative vs. Generative

●  DefiniJons○  hgenandhdis:generaJveanddiscriminaJve

classifiers

○  hgen,infandhdis,inf:sameclassifiersbuttrainedontheenJrepopulaJon(asymptoJcclassifiers)

○  n→infinity,hgen→hgen,infandhdis→hdis,inf

Ng,Jordan,."OndiscriminaJvevs.generaJveclassifiers:AcomparisonoflogisJcregressionandnaivebayes."AdvancesinneuralinformaEonprocessingsystems14(2002):841.

Page 42: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Discriminative vs. Generative

ProposiJon1:

ProposiJon2:-p:numberofdimensions-n:numberofobservaJons-ϵ:generalizaJonerror

Page 43: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Logistic Regression vs. NBC

DiscriminaJveclassifier(LogisJcRegression)-SmallerasymptoJcerror-  Slowconvergence~O(p)

GeneraJveclassifier(NaiveBayes)-LargerasymptoJcerror-Canhandlemissingdata(EM)-Fastconvergence~O(lg(p))

Innumericalanalysis,thespeedatwhichaconvergentsequenceapproachesitslimitiscalledtherateofconvergence.

Page 44: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

generalizaJonerror

LogisJcRegression

NaiveBayes

Sizeoftrainingset

Ng,Jordan,."OndiscriminaJvevs.generaJveclassifiers:AcomparisonoflogisJcregressionandnaivebayes."AdvancesinneuralinformaEonprocessingsystems14(2002):841.

11/2/16

Dr.YanjunQi/UVACS6316-450104/f16

44

Page 45: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Xue,Jing-Hao,andD.MichaelTi{erington."Commenton“OndiscriminaJvevs.generaJveclassifiers:AcomparisonoflogisJcregressionandnaiveBayes”."Neuralprocessingle0ers28.3(2008):169-187.

generalizaJonerror

Sizeoftrainingset

Page 46: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

Discriminative vs. Generative

●  Empirically,generaJveclassifiersapproachtheirasymptoJcerrorfasterthandiscriminaJveones○  Goodforsmalltrainingset○  Handlemissingdatawell(EM)

●  Empirically,discriminaJveclassifiershavelowerasymptoJcerrorthangeneraJveones○  Goodforlargertrainingset

Page 47: UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 14: … › yanjun › teach › 2016f › lecture › 16f-L14... · – Fall 2016 Machine Learning Lecture 14: LogisAc Regression

References

q Prof.Tan,Steinbach,Kumar’s“IntroducJontoDataMining”slide

q Prof.AndrewMoore’sslidesq Prof.EricXing’sslidesq Prof.KeChenNBslidesq HasJe,Trevor,etal.TheelementsofstaEsEcallearning.Vol.2.No.1.NewYork:Springer,2009.

11/2/16 47

Dr.YanjunQi/UVACS6316-450104/f16