kernel methods part 2 bing han june 26, 2008. local likelihood logistic regression

21
Kernel Methods Part 2 Bing Han June 26, 2008

Post on 21-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Kernel MethodsPart 2

Bing Han

June 26, 2008

Local Likelihood

Logistic Regression

Logistic Regression

After a simple calculation, we get

We denote the probabilities

Logistic regression models are usually fit by maximum likelihood

Local Likelihood

The data has feature xi and classes {1,2,…,J} The linear model is

Local Likelihood

Local logistic regression

The local log-likelihood for this J class model

)|Pr(),()|r(P1

00 ii

N

iii xXgGxxKxXgG

Kernel Density Estimation

Kernel Density Estimation

We have a random sample x1, x2, …,xN, we want to estimate probability density

A natural local estimate

Smooth Pazen estimate

Kernel Density Estimation

A popular choice is Gaussian Kernel

A natural generalization of the Gaussian density estimate by the Gaussian product kernel

Kernel Density Classification

Density estimates Estimates of class priors

By Bayes’ theorem

)(ˆ Xf j

j

Kernel Density Classification

Naïve Bayes Classifier

Assume given a class G=j, the features Xk are independent

Naïve Bayes Classifier

A generalized additive model

Similar to logistic regression

Radial Basis Functions

Functions can be represented as expansions in basis functions

Radial basis functions treat kernel functions as basis functions. This lead to model

Method of learning parameters

Optimize the sum-of squares with respect to all the parameters:

Radial Basis Functions

Reduce the parameter set and assume a constant value for it will produce an undesirable effect.

Renormalized radial basis functions

j

Radial Basis Functions

Mixture models

Gaussian mixture model for density estimation

In general, mixture models can use any component densities. The Gaussian mixture model is the most popular.

Mixture models

If , Radial basis expansion

If , kernel density estimate

Where

Mixture models

The parameter are usually fit by maximum likelihood, such as EM algorithm

The mixture model also provides an estimate of the probability that observation i belong to component m

Questions?