improve naïve bayesian classifier by discriminative training
DESCRIPTION
Improve Naïve Bayesian Classifier by Discriminative Training. Kaizhu Huang, Zhangbing Zhou , Irwin King , Michael R. Lyu Oct. 2005. Outline. Background Classifiers Discriminative classifiers: Support Vector Machines Generative classifiers: Naïve Bayesian Classifiers Motivation - PowerPoint PPT PresentationTRANSCRIPT
ICONIP 2005
Improve Naïve Bayesian Improve Naïve Bayesian Classifier by Discriminative Classifier by Discriminative
TrainingTraining
Kaizhu Huang, Zhangbing ZhouKaizhu Huang, Zhangbing Zhou, , Irwin KingIrwin King, , Michael R. LyuMichael R. Lyu
Oct. 2005Oct. 2005
ICONIP 2005
OutlineOutline
BackgroundBackground– ClassifiersClassifiers
» Discriminative classifiers: Support Vector MachinesDiscriminative classifiers: Support Vector Machines» Generative classifiers: Naïve Bayesian ClassifiersGenerative classifiers: Naïve Bayesian Classifiers
MotivationMotivation Discriminative Naïve Bayesian ClassifierDiscriminative Naïve Bayesian Classifier ExperimentsExperiments DiscussionsDiscussions ConclusionConclusion
ICONIP 2005
BackgroundBackground
Discriminative ClassifiersDiscriminative Classifiers– Directly maximize a discriminative function or Directly maximize a discriminative function or
posterior function posterior function – Example: Support Vector MachinesExample: Support Vector Machines
SVM
ICONIP 2005
BackgroundBackground
Generative ClassifiersGenerative Classifiers– Model the joint distribution for each class P(x|C) and Model the joint distribution for each class P(x|C) and
then use Bayes rules to construct posterior classifiers then use Bayes rules to construct posterior classifiers P(C|x), C : class label, x: features .P(C|x), C : class label, x: features .
– Example: Naïve Bayesian ClassifiersExample: Naïve Bayesian Classifiers» Model the distribution for each class under the assumption: Model the distribution for each class under the assumption:
each feature of the data is each feature of the data is independentindependent of others features, when of others features, when given the class labelgiven the class label. .
)4..()()|(maxarg
)3().........()|(maxarg
)2........()(
)()|(maxarg
)1(........)|(maxarg
1
m
jiijc
iic
iic
ic
CPCxP
CPCxPxp
CPCxP
xCPC
i
i
i
i
Constant w.r.t. C
Combining the assumption
mjiCxPCxPCxxP jiji 1 ),|()|()|,(
ICONIP 2005
BackgroundBackground
ComparisonComparison
Example of Missing Information:
From left to right: Original digit, 50% missing digit, 75% missing digit, and occluded digit.
ICONIP 2005
BackgroundBackground Why Generative classifiers are Why Generative classifiers are not accurate asnot accurate as
Discriminative classifiers?Discriminative classifiers?Training set
subset D1 labeled as Class 1
subset D2 Labelled as Class 2
Estimate distribution P1 to approximate D1
Estimate distribution P2 to approximate D2
Construct Bayes rule for classification
1.1. It is incomplete for generative It is incomplete for generative classifiers to just approximate the classifiers to just approximate the inner-class information.inner-class information.
2.2. The inter-class discriminative The inter-class discriminative information between classes are information between classes are discardeddiscarded
Scheme for Generative classifiers in two-category classification tasks
Needed!
ICONIP 2005
BackgroundBackground Why Generative Classifiers Why Generative Classifiers are superior toare superior to Discriminative Discriminative
Classifiers in Classifiers in handling missing informationhandling missing information problems? problems?– SVM SVM lacks the abilitylacks the ability under the uncertainty under the uncertainty– NB can NB can conduct uncertainty inferenceconduct uncertainty inference under the estimated under the estimated
distribution. distribution.
A is the feature set
T is the subset of A, which is missing
A-T is thus the known features
ICONIP 2005
MotivationMotivation
It seems that a good classifier should It seems that a good classifier should combinecombine the strategies of discriminative the strategies of discriminative classifiers and generative classifiers.classifiers and generative classifiers.
Our work trains one of the Our work trains one of the generativegenerative classifier: Naïve Bayesian Classifier in a classifier: Naïve Bayesian Classifier in a discriminativediscriminative way. way.
ICONIP 2005
Interaction
is needed!!
Discriminative Naïve Bayesian Discriminative Naïve Bayesian ClassifierClassifier
Training set
Sub-set D1labeled as Class I
Sub-set D2 labeled as Class 2
Estimate the distribution P1 to
approximate D1
Estimate the distribution P2 to approximate D2
Use Bayes rule for classification
Working Scheme of Naïve Bayesian Classifier Mathematic Explanation of Naïve Bayesian Classifier
Easily solved by Lagrange Multiplier method
ICONIP 2005
Discriminative Naïve Bayesian Discriminative Naïve Bayesian Classifier (DNB)Classifier (DNB)
Optimization function of DNBOptimization function of DNB
•On one hand, the minimization of this function tries to approximate the dataset as accurately as possible.
• On the other hand, the optimization on this function also tries to enlarge the divergence between classes.
• Optimization on joint distribution directly inherits the ability of NB in handling missing information problems
Divergence item
ICONIP 2005
Discriminative Naïve Bayesian Discriminative Naïve Bayesian Classifier (DNB)Classifier (DNB)
Complete Optimization problemComplete Optimization problem
Nonlinear optimization problem under linear Nonlinear optimization problem under linear constraints.constraints.
ICONIP 2005
Discriminative Naïve Bayesian Discriminative Naïve Bayesian Classifier (DNB)Classifier (DNB)
Solve the Optimization problemSolve the Optimization problem– Using Rosen Gradient Projection methodsUsing Rosen Gradient Projection methods
ICONIP 2005
Discriminative Naïve Bayesian Discriminative Naïve Bayesian Classifier (DNB)Classifier (DNB)
Gradient and Projection matrixGradient and Projection matrix
ICONIP 2005
Extension to Multi-category Extension to Multi-category Classification problemsClassification problems
ICONIP 2005
Experimental resultsExperimental results
Experimental SetupExperimental Setup– DatasetsDatasets
» 4 benchmark datasets from UCI machine learning repository4 benchmark datasets from UCI machine learning repository– Experimental EnvironmentsExperimental Environments
» Platform:Windows 2000Platform:Windows 2000» Developing tool: Matlab 6.5Developing tool: Matlab 6.5
ICONIP 2005
Without information missingWithout information missing
ObservationsObservations
–DNB outperforms NB in every datasetsDNB outperforms NB in every datasets
–DNB wins in 2 datasets while it loses in the other 2 DNB wins in 2 datasets while it loses in the other 2 datasets in comparison with SVMdatasets in comparison with SVM
–SVM outperforms DNB in Segment and SatimagesSVM outperforms DNB in Segment and Satimages
ICONIP 2005
With information missingWith information missing Scheme Scheme
– DNB usesDNB uses
to conduct inference when there is information to conduct inference when there is information missingmissing – SVM sets SVM sets 0 0 values to the missing features (the values to the missing features (the
default way to process unknown features in default way to process unknown features in LIBSVM) LIBSVM)
…………..(5)
ICONIP 2005
With information missingWith information missing
Error Rate in Iris with missing information
Setup : Setup : Randomly discard features gradually from a small Randomly discard features gradually from a small percentage to a big percentagepercentage to a big percentage
Error Rate in Vote with missing information
ICONIP 2005
With information missingWith information missing
Error Rate in Satimage with missing information Error Rate in DNA with missing information
ICONIP 2005
Summary of Experiment ResultsSummary of Experiment Results
1.1. ObservationsObservations NB demonstrates a robust ability in handling NB demonstrates a robust ability in handling
missing information problems.missing information problems. DNB inherits the ability of NB in handling DNB inherits the ability of NB in handling
missing information problems while it has a missing information problems while it has a higher classification accuracy than NBhigher classification accuracy than NB
SVM cannot deal with missing information SVM cannot deal with missing information problems easily.problems easily.
ICONIP 2005
DiscussionDiscussion Can DNB be extended to general Bayesian Can DNB be extended to general Bayesian
Network (BN) Classifier?Network (BN) Classifier?– Structure learning problem will be involved. Structure learning problem will be involved.
Direct application of DNB will encounter Direct application of DNB will encounter difficulties since the structure is non-fixed in difficulties since the structure is non-fixed in restricted BNs .restricted BNs .
– Finding optimal General Bayesian Network Finding optimal General Bayesian Network Classifiers is an NP-complete problem.Classifiers is an NP-complete problem.
Discriminative training on constrained Discriminative training on constrained Bayesian Network Classifier is possible…Bayesian Network Classifier is possible…
ICONIP 2005
ConclusionConclusion
We develop a novel model named We develop a novel model named Discriminative Naïve Bayesian ClassifiersDiscriminative Naïve Bayesian Classifiers
– It outperforms Naïve Bayesian Classifier when It outperforms Naïve Bayesian Classifier when no information is missingno information is missing
– It outperforms SVMs in handling missing It outperforms SVMs in handling missing information problems.information problems.