7 logistic regression - penn engineeringcis520/lectures/7_logistic_regression.… · logistic...

Post on 23-Sep-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Logistic RegressionLyle Ungar

Learning objectivesLogistic model & lossDecision boundaries as hyperplanesMulti-class regression

What do you do with a binary y?u Can you use linear regression?

l y = wTxu How about a different link function?

l y = f(wTx)u Or a different probability distribution

l P(y=1|x) = f(wTx)

Logistic function

h✓(x) =1

1 + e�✓Tx

Logistic Regression

Log odds

Log likelihood of data

y = 1 or -1

Decision Boundary

Representing Hyperplanes• How do we represent a line?

• In general, a hyperplane is defined by

The red vector (w) defines the green hyper plane that is orthogonal to it.

Why bother with this weird representation?

0 = wTx

[1,-1]

Projections

Now classification is easy!

h(x) = sgn(wTx)

Computing MLEu Use gradient ascent

Loss function = log-likelihood

Computing MAPu Prior

u So solve

u Again use gradient descent

gg

gg g

2g2

2g2

Multi-Class Classification

Disease diagnosis: healthy / cold / flu / pneumoniaObject classification: desk / chair / monitor / bookcase

x1

x2

x1

x2

Binary classification:

Multi-class classification:

Multi-Class Logistic Regressionu For 2 classes:

u For K classes:

l Called the softmax functionn maps a vector to a probability distribution

h✓(x) =1

1 + exp(�✓Tx)=

exp(✓Tx)

1 + exp(✓Tx)h✓(x) =

1

1 + exp(�✓Tx)=

exp(✓Tx)

1 + exp(✓Tx)

weight assigned to

y = -1

weight assigned to

y = 1

p(y = k | x; ✓1, . . . , ✓k) = exp(✓>k x)PK

k=1 exp(✓>k x)

<latexit sha1_base64="Q9FhtzPFW1Pl5XW1deBWQxJu3BU=">AAACe3ichVHLahsxFNVMH0ndR9xkWSiiptQpxswklAaKwW03hWxSqJOA5QwajSYWI42EdKfEDLPNvv207rLIqj/RTaHyY1EnhR4QOpxzrx7npkYKB1F0FYR37t67v7H5oPXw0eMnW+2n28dOV5bxEdNS29OUOi5FyUcgQPJTYzlVqeQnafFx7p985dYJXX6BmeETRc9LkQtGwUtJ+7vpzvAAF5gokWGS5vVF826xE5hyoE0S9zCRmQbXW5OLXdwaYJJbymrCL0x3zTwjoM3quN2mJq5SSV0M4ubsEP+vOml3on60AL5N4hXpDN8ffrv+WV0eJe0fJNOsUrwEJqlz4zgyMKmpBcEkb1qkctxQVtBzPva0pIq7Sb3IrsEvvZLhXFu/SsAL9e+OmirnZir1lYrC1N305uK/vHEF+cGkFqWpgJdseVFeSQwazweBM2E5AznzhDIr/Fsxm1IfJ/hxtXwI8c0v3ybHe/14v7/3Oe4MP6AlNtEz9AJ1UYzeoiH6hI7QCDH0K3gevAq6we+wE74Oe8vSMFj17KA1hG/+ALC1xTU=</latexit>

Multi-Class Logistic Regression

u Train a logistic regression classifier for each class k to predict the probability that y = k with

x1

x2

Split into One vs. Rest:

hk(x) =exp(✓>

k x)Pkk=1 exp(✓k>x)

<latexit sha1_base64="Fch+YdYAD1Fzy16gzFuLgQRYv84=">AAACS3icbVC7ahtBFJ2VrcRWXpu4TDNYBBQCYlcp7EagJI1LBawHaKVldnRXGnb2wczdYLHsF+Srgps0adLlJ9K4sDEpMnoUluQDFw7nnMvMPUEmhUbH+WNVDg6rT54eHdeePX/x8pX9+k1fp7ni0OOpTNUwYBqkSKCHAiUMMwUsDiQMgujL0h98A6VFmlziIoNxzGaJCAVnaCTfDuZ+1PCCsLgq39M29ULFeOHBVbYSPZwDstKPJh6mGd3kysLTeewXUdstJxHdT2+Fa75dd5rOCnSfuBtS73yaXP/43v3Q9e3f3jTleQwJcsm0HrlOhuOCKRRcQlnzcg0Z4xGbwcjQhMWgx8Wqi5K+M8qUhqkykyBdqQ83ChZrvYgDk4wZzvWutxQf80Y5hufjQiRZjpDw9UNhLimmdFksnQoFHOXCEMaVMH+lfM5MnWjqX5bg7p68T/qtpvux2frq1jufyRpH5C05JQ3ikjPSIRekS3qEk5/kL7kld9Yv68a6t/6toxVrs3NCtlCp/gee47gC</latexit>

q1q3

q2

Implementing Multi-Class Logistic Regression

u P(y=k|x) estimated by:u Gradient descent simultaneously updates all

parameters for all modelsl Same derivative as before, just with the above hk(x)

u Predict class label as the most probable label

hk(x) =exp(✓>

k x)Pkk=1 exp(✓k>x)

<latexit sha1_base64="Fch+YdYAD1Fzy16gzFuLgQRYv84=">AAACS3icbVC7ahtBFJ2VrcRWXpu4TDNYBBQCYlcp7EagJI1LBawHaKVldnRXGnb2wczdYLHsF+Srgps0adLlJ9K4sDEpMnoUluQDFw7nnMvMPUEmhUbH+WNVDg6rT54eHdeePX/x8pX9+k1fp7ni0OOpTNUwYBqkSKCHAiUMMwUsDiQMgujL0h98A6VFmlziIoN xzGaJCAVnaCTfDuZ+1PCCsLgq39M29ULFeOHBVbYSPZwDstKPJh6mGd3kysLTeewXUdstJxHdT2+Fa75dd5rOCnSfuBtS73yaXP/43v3Q9e3f3jTleQwJcsm0HrlOhuOCKRRcQlnzcg0Z4xGbwcjQhMWgx8Wqi5K+M8qUhqkykyBdqQ83ChZrvYgDk4wZzvWutxQf80Y5hufjQiRZjpDw9UNhLimmdFksnQoFHOXCEMaVMH+lfM5MnWjqX5bg7p68T/qtpvux2frq1jufyRpH5C05JQ3ikjPSIRekS3qEk5/kL7kld9Yv68a6t/6toxVrs3NCtlCp/gee47gC</latexit>

You should knowu Logistic model & loss

l Linear in log-odds

u Decision boundariesl hyperplane

u Softmaxl Maps vector to probability distribution

top related