improve naïve bayesian classifier by discriminative training

ICONIP 2005

Improve Naïve Bayesian Improve Naïve Bayesian Classifier by Discriminative Classifier by Discriminative

TrainingTraining

Kaizhu Huang, Zhangbing ZhouKaizhu Huang, Zhangbing Zhou, , Irwin KingIrwin King, , Michael R. LyuMichael R. Lyu

Oct. 2005Oct. 2005

ICONIP 2005

OutlineOutline

BackgroundBackground– ClassifiersClassifiers

» Discriminative classifiers: Support Vector MachinesDiscriminative classifiers: Support Vector Machines» Generative classifiers: Naïve Bayesian ClassifiersGenerative classifiers: Naïve Bayesian Classifiers

MotivationMotivation Discriminative Naïve Bayesian ClassifierDiscriminative Naïve Bayesian Classifier ExperimentsExperiments DiscussionsDiscussions ConclusionConclusion

ICONIP 2005

BackgroundBackground

Discriminative ClassifiersDiscriminative Classifiers– Directly maximize a discriminative function or Directly maximize a discriminative function or

posterior function posterior function – Example: Support Vector MachinesExample: Support Vector Machines

SVM

ICONIP 2005


Generative ClassifiersGenerative Classifiers– Model the joint distribution for each class P(x|C) and Model the joint distribution for each class P(x|C) and

then use Bayes rules to construct posterior classifiers then use Bayes rules to construct posterior classifiers P(C|x), C : class label, x: features .P(C|x), C : class label, x: features .

– Example: Naïve Bayesian ClassifiersExample: Naïve Bayesian Classifiers» Model the distribution for each class under the assumption: Model the distribution for each class under the assumption:

each feature of the data is each feature of the data is independentindependent of others features, when of others features, when given the class labelgiven the class label. .

)4..()()|(maxarg

)3().........()|(maxarg

)2........()(

)()|(maxarg

)1(........)|(maxarg

1

m

jiijc

iic

iic

ic

CPCxP

CPCxPxp

CPCxP

xCPC

i

i

i

i

Constant w.r.t. C

Combining the assumption

mjiCxPCxPCxxP jiji 1 ),|()|()|,(

ICONIP 2005


ComparisonComparison

Example of Missing Information:

From left to right: Original digit, 50% missing digit, 75% missing digit, and occluded digit.

ICONIP 2005

BackgroundBackground Why Generative classifiers are Why Generative classifiers are not accurate asnot accurate as

Discriminative classifiers?Discriminative classifiers?Training set

subset D1 labeled as Class 1

subset D2 Labelled as Class 2

Estimate distribution P1 to approximate D1

Estimate distribution P2 to approximate D2

Construct Bayes rule for classification

1.1. It is incomplete for generative It is incomplete for generative classifiers to just approximate the classifiers to just approximate the inner-class information.inner-class information.

2.2. The inter-class discriminative The inter-class discriminative information between classes are information between classes are discardeddiscarded

Scheme for Generative classifiers in two-category classification tasks

Needed!

ICONIP 2005

BackgroundBackground Why Generative Classifiers Why Generative Classifiers are superior toare superior to Discriminative Discriminative

Classifiers in Classifiers in handling missing informationhandling missing information problems? problems?– SVM SVM lacks the abilitylacks the ability under the uncertainty under the uncertainty– NB can NB can conduct uncertainty inferenceconduct uncertainty inference under the estimated under the estimated

distribution. distribution.

A is the feature set

T is the subset of A, which is missing

A-T is thus the known features

ICONIP 2005

MotivationMotivation

It seems that a good classifier should It seems that a good classifier should combinecombine the strategies of discriminative the strategies of discriminative classifiers and generative classifiers.classifiers and generative classifiers.

Our work trains one of the Our work trains one of the generativegenerative classifier: Naïve Bayesian Classifier in a classifier: Naïve Bayesian Classifier in a discriminativediscriminative way. way.

ICONIP 2005

Interaction

is needed!!

Discriminative Naïve Bayesian Discriminative Naïve Bayesian ClassifierClassifier

Training set

Sub-set D1labeled as Class I

Sub-set D2 labeled as Class 2

Estimate the distribution P1 to

approximate D1

Estimate the distribution P2 to approximate D2

Use Bayes rule for classification

Working Scheme of Naïve Bayesian Classifier Mathematic Explanation of Naïve Bayesian Classifier

Easily solved by Lagrange Multiplier method

ICONIP 2005

Discriminative Naïve Bayesian Discriminative Naïve Bayesian Classifier (DNB)Classifier (DNB)

Optimization function of DNBOptimization function of DNB

•On one hand, the minimization of this function tries to approximate the dataset as accurately as possible.

• On the other hand, the optimization on this function also tries to enlarge the divergence between classes.

• Optimization on joint distribution directly inherits the ability of NB in handling missing information problems

Divergence item

ICONIP 2005


Complete Optimization problemComplete Optimization problem

Nonlinear optimization problem under linear Nonlinear optimization problem under linear constraints.constraints.

ICONIP 2005


Solve the Optimization problemSolve the Optimization problem– Using Rosen Gradient Projection methodsUsing Rosen Gradient Projection methods

ICONIP 2005


Gradient and Projection matrixGradient and Projection matrix

ICONIP 2005

Extension to Multi-category Extension to Multi-category Classification problemsClassification problems

ICONIP 2005

Experimental resultsExperimental results

Experimental SetupExperimental Setup– DatasetsDatasets

» 4 benchmark datasets from UCI machine learning repository4 benchmark datasets from UCI machine learning repository– Experimental EnvironmentsExperimental Environments

» Platform:Windows 2000Platform:Windows 2000» Developing tool: Matlab 6.5Developing tool: Matlab 6.5

ICONIP 2005

Without information missingWithout information missing

ObservationsObservations

–DNB outperforms NB in every datasetsDNB outperforms NB in every datasets

–DNB wins in 2 datasets while it loses in the other 2 DNB wins in 2 datasets while it loses in the other 2 datasets in comparison with SVMdatasets in comparison with SVM

–SVM outperforms DNB in Segment and SatimagesSVM outperforms DNB in Segment and Satimages

ICONIP 2005

With information missingWith information missing Scheme Scheme

– DNB usesDNB uses

to conduct inference when there is information to conduct inference when there is information missingmissing – SVM sets SVM sets 0 0 values to the missing features (the values to the missing features (the

default way to process unknown features in default way to process unknown features in LIBSVM) LIBSVM)

…………..(5)

ICONIP 2005

With information missingWith information missing

Error Rate in Iris with missing information

Setup : Setup : Randomly discard features gradually from a small Randomly discard features gradually from a small percentage to a big percentagepercentage to a big percentage

Error Rate in Vote with missing information

ICONIP 2005

With information missingWith information missing

Error Rate in Satimage with missing information Error Rate in DNA with missing information

ICONIP 2005

Summary of Experiment ResultsSummary of Experiment Results

1.1. ObservationsObservations NB demonstrates a robust ability in handling NB demonstrates a robust ability in handling

missing information problems.missing information problems. DNB inherits the ability of NB in handling DNB inherits the ability of NB in handling

missing information problems while it has a missing information problems while it has a higher classification accuracy than NBhigher classification accuracy than NB

SVM cannot deal with missing information SVM cannot deal with missing information problems easily.problems easily.

ICONIP 2005

DiscussionDiscussion Can DNB be extended to general Bayesian Can DNB be extended to general Bayesian

Network (BN) Classifier?Network (BN) Classifier?– Structure learning problem will be involved. Structure learning problem will be involved.

Direct application of DNB will encounter Direct application of DNB will encounter difficulties since the structure is non-fixed in difficulties since the structure is non-fixed in restricted BNs .restricted BNs .

– Finding optimal General Bayesian Network Finding optimal General Bayesian Network Classifiers is an NP-complete problem.Classifiers is an NP-complete problem.

Discriminative training on constrained Discriminative training on constrained Bayesian Network Classifier is possible…Bayesian Network Classifier is possible…

ICONIP 2005

ConclusionConclusion

We develop a novel model named We develop a novel model named Discriminative Naïve Bayesian ClassifiersDiscriminative Naïve Bayesian Classifiers

– It outperforms Naïve Bayesian Classifier when It outperforms Naïve Bayesian Classifier when no information is missingno information is missing

– It outperforms SVMs in handling missing It outperforms SVMs in handling missing information problems.information problems.

improve naïve bayesian classifier by discriminative training

Documents

nave bayesian classifiersmodel

discriminative function

discriminative way

posterior classifiers

good classifier

missing digit

innerclass information

class label