multivariate linear models for regression and classification outline: 1) multivariate linear...
TRANSCRIPT
Multivariate linear models for regression and classification
Outline:1) multivariate linear regression2) linear classification (perceptron)3) logistic regression
Logistic Regression(lecture 9 on amlbook.com)
Neuron analogy
Dot product wTx is a way of combining attributes into a scalar signal s. How signal is used defines the hypothesis set.
In logistic regression, signal become argument of a function with properties like a probability distribution
Objective: find w such that risk score >> 0 for patients that had a heart attack (q(s) ~ 1) and risk score << 0 for those who have not (q(s) ~ 0).
Application: risk of heart attack
More specifically (see text p91)
Dataset drawn from a distribution function P(y|x), which is related to hypothesis h(x) by
P(yn|xn) = h(xn) if yn = +1; P(yn|xn) = 1 - h(xn) if yn = -1
Logistic function has the property q(-s) = 1 – q(s)
Hence, both relationships are satisfied by P(yn|xn) = q(ynwT xn)
Now use maximum likelihood estimation (MLE) to derive an error function that we minimize to find the optimum w
Recall that MLE is used to Estimate parameters of a probability distribution given a sample X drawn from that distribution
In logistic regression, parameters are the weights
Likelihood of w given the sample Xl(w|X) = p (X |w) = ∏
t p(xt|w)
Log likelihood L(w|X) = log(l(w|X)) = ∑
t log p(xt|w)
In logistic regression, p(xt|w) = q(ynwT xn)
Since Log is a monotone increasing function, maximizing log(likelihood) is equivalent to minimizing -log(likelihood)
Text also normalizes by dividing by N; hence error function becomes
Error function of logistic regression (called cross entropy) has the desired properties.
If xn are attributes of person who has had a heart attack, wTxn >> 0 and yn > 0 so contribution to Ein(w) is small.
If xn are attributes of person who has not had a heart attack, wTxn << 0 and yn < 0 so contribution to Ein(w) is again small.
Error function of linear regression allows “1-step” optimization.
Not true for error function of logistic regression
Optimization is iterative; method is “steepest decent”
Method of steepest (gradient) decent:Fixed step size hw(1) = w(0) + hvhat
Unit vector in the direction of the gradient
Method of steepest (gradient) decent:Fixed leaning rate hw(1) = w(0) + delta wWeights change fastest where gradient is largest
For Ein = cross entropy, gradient is analytical
Logistics regression algorithm
How to compute gradient of Ein
How to known when to stop
Assignment 6: Due 10-30-14