![Page 1: Logistic regression: a simple ANN · PDF fileOutline of the lecture This lecture describes the construction of binary classifiers using a technique called Logistic Regression. The](https://reader031.vdocuments.us/reader031/viewer/2022030405/5a7a902e7f8b9a05348cfb98/html5/thumbnails/1.jpg)
Logistic regression: a simple ANNNando de Freitas
![Page 2: Logistic regression: a simple ANN · PDF fileOutline of the lecture This lecture describes the construction of binary classifiers using a technique called Logistic Regression. The](https://reader031.vdocuments.us/reader031/viewer/2022030405/5a7a902e7f8b9a05348cfb98/html5/thumbnails/2.jpg)
Outline of the lectureThis lecture describes the construction of binary classifiers using atechnique called Logistic Regression. The objective is for you to learn:
How to apply logistic regression to discriminate between twoclasses. How to formulate the logistic regression likelihood. How to derive the gradient and Hessian of logistic regression. How to incorporate the gradient vector and Hessian matrix intoNewton’s optimization algorithm so as to come up with an algorithmfor logistic regression, which we call IRLS. How to do logistic regression with the softmax link.
![Page 3: Logistic regression: a simple ANN · PDF fileOutline of the lecture This lecture describes the construction of binary classifiers using a technique called Logistic Regression. The](https://reader031.vdocuments.us/reader031/viewer/2022030405/5a7a902e7f8b9a05348cfb98/html5/thumbnails/3.jpg)
McCulloch-Pitts model of a neuron
![Page 4: Logistic regression: a simple ANN · PDF fileOutline of the lecture This lecture describes the construction of binary classifiers using a technique called Logistic Regression. The](https://reader031.vdocuments.us/reader031/viewer/2022030405/5a7a902e7f8b9a05348cfb98/html5/thumbnails/4.jpg)
Sigmoid functionP
sigm(´) refers to the sigmoid function, also known as the logistic orlogit function:
sigm(´) =1
1 + e¡´=
e´
e´ + 1
![Page 5: Logistic regression: a simple ANN · PDF fileOutline of the lecture This lecture describes the construction of binary classifiers using a technique called Logistic Regression. The](https://reader031.vdocuments.us/reader031/viewer/2022030405/5a7a902e7f8b9a05348cfb98/html5/thumbnails/5.jpg)
Linear separating hyper-plane
[Greg Shakhnarovich]
![Page 6: Logistic regression: a simple ANN · PDF fileOutline of the lecture This lecture describes the construction of binary classifiers using a technique called Logistic Regression. The](https://reader031.vdocuments.us/reader031/viewer/2022030405/5a7a902e7f8b9a05348cfb98/html5/thumbnails/6.jpg)
Bernoulli: a model for coins
A Bernoulli random variable r.v. X takes values in {0,1}
q if x=1p(x|q ) =
1- q if x=0
Where q 2 (0,1). We can write this probability more succinctly asfollows:
![Page 7: Logistic regression: a simple ANN · PDF fileOutline of the lecture This lecture describes the construction of binary classifiers using a technique called Logistic Regression. The](https://reader031.vdocuments.us/reader031/viewer/2022030405/5a7a902e7f8b9a05348cfb98/html5/thumbnails/7.jpg)
Entropy
In information theory, entropy H is a measure of the uncertaintyassociated with a random variable. It is defined as:
H(X) = - p(x|q ) log p(x|q )
Example: For a Bernoulli variable X, the entropy is:
Sx
![Page 8: Logistic regression: a simple ANN · PDF fileOutline of the lecture This lecture describes the construction of binary classifiers using a technique called Logistic Regression. The](https://reader031.vdocuments.us/reader031/viewer/2022030405/5a7a902e7f8b9a05348cfb98/html5/thumbnails/8.jpg)
Logistic regressionThe logistic regression model speci¯es the probability of a binary outputyi 2 f0; 1g given the input xi as follows:
p(yjX; µ) =
nY
i=1
Ber(yijsigm(xiµ))
=nY
i=1
·1
1 + e¡xiµ
¸yi ·
1¡1
1 + e¡xiµ
¸1¡yi
where xiµ = µ0 +Pd
j=1 µjxij
![Page 9: Logistic regression: a simple ANN · PDF fileOutline of the lecture This lecture describes the construction of binary classifiers using a technique called Logistic Regression. The](https://reader031.vdocuments.us/reader031/viewer/2022030405/5a7a902e7f8b9a05348cfb98/html5/thumbnails/9.jpg)
Gradient and Hessian of binary logistic regression
The gradient and Hessian of the negative loglikelihood, J(µ) = ¡ log p(yjX;µ),are given by:
g(w) =d
dµJ(µ) =
nX
i=1
xTi (¼i ¡ yi) = XT (¼ ¡ y)
H =d
dµg(µ)T =
X
i
¼i(1¡ ¼i)xixTi = XTdiag(¼i(1¡ ¼i))X
where ¼i = sigm(xiµ)
One can show that H is positive de¯nite; hence the NLL is convex andhas a unique global minimum.
To ¯nd this minimum, we turn to batch optimization.
![Page 10: Logistic regression: a simple ANN · PDF fileOutline of the lecture This lecture describes the construction of binary classifiers using a technique called Logistic Regression. The](https://reader031.vdocuments.us/reader031/viewer/2022030405/5a7a902e7f8b9a05348cfb98/html5/thumbnails/10.jpg)
Iteratively reweighted least squares (IRLS)For binary logistic regression, recall that the gradient and Hessian of thenegative log-likelihood are given by
gk = XT (¼k ¡ y)
Hk = XTSkX
Sk := diag(¼1k(1¡ ¼1k); : : : ; ¼nk(1¡ ¼nk))
¼ik = sigm(xiµk)
The Newton update at iteration k + 1 for this model is as follows (using´k = 1, since the Hessian is exact):
µk+1 = µk ¡H¡1gk
= µk + (XTSkX)¡1XT (y ¡ ¼k)
= (XTSkX)¡1£(XTSkX)µk + XT (y ¡ ¼k)
¤
= (XTSkX)¡1XT [SkXµk + y ¡ ¼k]
![Page 11: Logistic regression: a simple ANN · PDF fileOutline of the lecture This lecture describes the construction of binary classifiers using a technique called Logistic Regression. The](https://reader031.vdocuments.us/reader031/viewer/2022030405/5a7a902e7f8b9a05348cfb98/html5/thumbnails/11.jpg)
Softmax formulation
![Page 12: Logistic regression: a simple ANN · PDF fileOutline of the lecture This lecture describes the construction of binary classifiers using a technique called Logistic Regression. The](https://reader031.vdocuments.us/reader031/viewer/2022030405/5a7a902e7f8b9a05348cfb98/html5/thumbnails/12.jpg)
Likelihood function
![Page 13: Logistic regression: a simple ANN · PDF fileOutline of the lecture This lecture describes the construction of binary classifiers using a technique called Logistic Regression. The](https://reader031.vdocuments.us/reader031/viewer/2022030405/5a7a902e7f8b9a05348cfb98/html5/thumbnails/13.jpg)
Negative log-likelihood criterion
![Page 14: Logistic regression: a simple ANN · PDF fileOutline of the lecture This lecture describes the construction of binary classifiers using a technique called Logistic Regression. The](https://reader031.vdocuments.us/reader031/viewer/2022030405/5a7a902e7f8b9a05348cfb98/html5/thumbnails/14.jpg)
Neural network representation of loss
![Page 15: Logistic regression: a simple ANN · PDF fileOutline of the lecture This lecture describes the construction of binary classifiers using a technique called Logistic Regression. The](https://reader031.vdocuments.us/reader031/viewer/2022030405/5a7a902e7f8b9a05348cfb98/html5/thumbnails/15.jpg)
Manual gradient computation
![Page 16: Logistic regression: a simple ANN · PDF fileOutline of the lecture This lecture describes the construction of binary classifiers using a technique called Logistic Regression. The](https://reader031.vdocuments.us/reader031/viewer/2022030405/5a7a902e7f8b9a05348cfb98/html5/thumbnails/16.jpg)
Manual gradient computation
![Page 17: Logistic regression: a simple ANN · PDF fileOutline of the lecture This lecture describes the construction of binary classifiers using a technique called Logistic Regression. The](https://reader031.vdocuments.us/reader031/viewer/2022030405/5a7a902e7f8b9a05348cfb98/html5/thumbnails/17.jpg)
Next lecture
In the next lecture, we develop an automatic layer-wise way ofcomputing all the necessary derivatives known as back-propagation.
This is the approach used in Torch. We will review the torch nn class.