data dependence in combining classifiers
Post on 19-Jan-2016
49 Views
Preview:
DESCRIPTION
TRANSCRIPT
Data Dependence in Data Dependence in Combining ClassifiersCombining Classifiers
Mohamed KamelMohamed KamelPAMI LabPAMI Lab
University of WaterlooUniversity of Waterloo
IntroductionIntroductionData DependenceData Dependence
Implicit DependenceImplicit Dependence Explicit DependenceExplicit Dependence
Feature Based ArchitectureFeature Based Architecture Training AlgorithmTraining Algorithm
ResultsResultsConclusionsConclusions
OutlineOutline
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
IntroductionIntroduction
Pattern Recognition SystemsPattern Recognition Systems Best possible classification rates.Best possible classification rates. Increase efficiency and accuracy.Increase efficiency and accuracy.
Multiple Classifier SystemsMultiple Classifier Systems Empirical ObservationEmpirical Observation Problem decomposed naturally from using various Problem decomposed naturally from using various
sensorssensors Avoid making commitments to arbitrary initial Avoid making commitments to arbitrary initial
conditions or parametersconditions or parameters
““Patterns mis-classified by different classifiers are Patterns mis-classified by different classifiers are not necessarily the same”not necessarily the same”[Kittler et. al., 98][Kittler et. al., 98]
Introduction
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Categorization of MCSCategorization of MCS
ArchitectureArchitecture
Input/Output MappingInput/Output Mapping
RepresentationRepresentation
Specialized classifiersSpecialized classifiers
Introduction
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Categorization of MCS Categorization of MCS (cntd…)(cntd…)
ArchitectureArchitecture
Parallel Parallel [Dasarathy, 94][Dasarathy, 94]
Serial Serial [Dasarathy, 94][Dasarathy, 94]
Classifier 1
Classifier 2
Classifier N
FUSION
Input 1
OutputInput 2
Input N
Classifier 1 Classifier 2 Classifier NInput 1
Input 2 Input N
Output
Introduction
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Categorization of MCS Categorization of MCS (cntd…)(cntd…)
Input/Output MappingInput/Output Mapping
Linear MappingLinear Mapping Sum RuleSum Rule Weighted AverageWeighted Average [Hashem 97][Hashem 97]
Non-linear Mapping Non-linear Mapping MaximumMaximum MajorityMajority Hierarchal Mixture of ExpertsHierarchal Mixture of Experts [Jordon and Jacobs 94][Jordon and Jacobs 94]
Stacked GeneralizationStacked Generalization [Wolpert 92][Wolpert 92]
Introduction
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Categorization of MCS Categorization of MCS (cntd…)(cntd…)
RepresentationRepresentation
Similar representationsSimilar representations Classifiers need to be differentClassifiers need to be different
Different representationDifferent representation Use of different sensorsUse of different sensors Different features extracted from the same data setDifferent features extracted from the same data set
Introduction
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Categorization of MCS Categorization of MCS (cntd…)(cntd…)
Specialized ClassifiersSpecialized Classifiers
Specialized classifiers Specialized classifiers Encourage specialization in areas of the feature spaceEncourage specialization in areas of the feature space All classifiers must contribute to achieve a final decision All classifiers must contribute to achieve a final decision Hierarchal Mixture of ExpertsHierarchal Mixture of Experts [Jordon and Jacobs 94][Jordon and Jacobs 94]
Co-operative Modular Neural NetworksCo-operative Modular Neural Networks [Auda and Kamel 98][Auda and Kamel 98]
Ensemble of classifiersEnsemble of classifiers Set of redundant classifiersSet of redundant classifiers
Introduction
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Categorization of MCS Categorization of MCS (cntd…)(cntd…)
Data DependenceData Dependence Classifiers inherently dependent on the data.Classifiers inherently dependent on the data. Describe how the final aggregation uses the Describe how the final aggregation uses the
information present in the input pattern.information present in the input pattern. Describe the relationship between the final Describe the relationship between the final
output output Q(x)Q(x) and the pattern under and the pattern under classification classification xx
Introduction
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Data DependenceData Dependence
Data IndependentData Independent
Implicitly DependentImplicitly Dependent
Explicitly DependentExplicitly Dependent
Data Dependence
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Data IndependenceData Independence
Solely rely on output of classifiers to determine Solely rely on output of classifiers to determine final classification output.final classification output.
Q(x)Q(x) is the final class assigned for pattern is the final class assigned for pattern xx
CCjj is a vector composed of the output of the various is a vector composed of the output of the various
classifiers in the ensemble {classifiers in the ensemble {cc1j1j,c,c2j2j,...,c,...,cNjNj} } for a given for a given
classclass y yjj
ccijij is the confidence classifier is the confidence classifier ii has in pattern has in pattern xx
belonging to class belonging to class yyjj
Mapping Mapping FFjj can be linear or non-linearcan be linear or non-linear
Data Dependence
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Data Independence Data Independence (cntd…)(cntd…)
ExampleExample
Average VoteAverage Vote
Aggregation result only relies on the output confidences of the Aggregation result only relies on the output confidences of the classifiersclassifiers
The operator The operator FFjj is the summation operationis the summation operation Result skewed if individual confidences contain biasResult skewed if individual confidences contain bias Aggregation has no means of correcting this biasAggregation has no means of correcting this bias
Data Dependence
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Data Independence Data Independence (cntd…)(cntd…)
Simple voting techniques are data independentSimple voting techniques are data independent AverageAverage MaximumMaximum MajorityMajority
Susceptible to incorrect estimates of the confidenceSusceptible to incorrect estimates of the confidence
Data Dependence
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Implicit Data DependenceImplicit Data Dependence
Train the combiner on global performance of the Train the combiner on global performance of the datadata
W(C(x)) W(C(x)) is the weighting matrix composed of is the weighting matrix composed of elementselements w wijij
wwijij is the weight assigned to class is the weight assigned to class j j in classifierin classifier i i
Implicit
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Implicit Data DependenceImplicit Data Dependence (cntd…)(cntd…)
ExampleExample
Weighted AverageWeighted Average Based on the error correlation matrix the individual Based on the error correlation matrix the individual
weights are assigned asweights are assigned as
The weights are dependent on the behavior of the The weights are dependent on the behavior of the classifiers amongst themselvesclassifiers amongst themselves
Weights can be represented as the function Weights can be represented as the function W(CW(Cjj(x))(x))
Implicit
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Implicit Data DependenceImplicit Data Dependence (cntd…)(cntd…)
ExampleExample
Weighted AverageWeighted Average Mapping is the summation operatorMapping is the summation operator Hence Weighted average fits in the Hence Weighted average fits in the
representationrepresentation
Implicit
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Implicit Data DependenceImplicit Data Dependence (cntd…)(cntd…)
Implicitly data dependent approaches includeImplicitly data dependent approaches include Weighted average Weighted average [Hashem 97][Hashem 97]
Fuzzy Measures Fuzzy Measures [Gader 96][Gader 96]
Belief theory Belief theory [Xu and Krzyzak, 92][Xu and Krzyzak, 92]
Behavior Knowledge Space (BKS) Behavior Knowledge Space (BKS) [Huang, 95][Huang, 95]
Decision Templates Decision Templates [Kuncheva 01][Kuncheva 01]
Modular approaches Modular approaches [Auda and Kamel, 98][Auda and Kamel, 98]
Stacked Generalization Stacked Generalization [Wolpert 92][Wolpert 92]
Boosting Boosting [Schapire, 90][Schapire, 90]
Lacks consideration for local superiority of Lacks consideration for local superiority of classifiersclassifiers
Implicit
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Explicit Data DependenceExplicit Data Dependence
Classifier selection or combining performed Classifier selection or combining performed based on the sub-space which the input pattern based on the sub-space which the input pattern belongs to.belongs to.
Final classification is dependent on the pattern Final classification is dependent on the pattern being classified.being classified.
Explicit
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Explicit Data Dependence Explicit Data Dependence (cntd…)(cntd…)
ExampleExample
Dynamic Classifier Selection (DCS)Dynamic Classifier Selection (DCS) Estimation of the accuracy of each classifier in local Estimation of the accuracy of each classifier in local
regions of the feature spaceregions of the feature space Estimate determined by observing the input patternEstimate determined by observing the input pattern Once superiority of classifier is identified, it’s output is Once superiority of classifier is identified, it’s output is
used as the final decisionused as the final decision i.e. Binary weights are assigned based on the local i.e. Binary weights are assigned based on the local
superiority of the classifiers.superiority of the classifiers. Since weights are dependent on the input feature Since weights are dependent on the input feature
space they can be represented as space they can be represented as W(x)W(x) DCS could therefore be considered explicitly data DCS could therefore be considered explicitly data
dependent with the mapping Fdependent with the mapping F jj being the maximum being the maximum
operatoroperator
Explicit
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Explicit Data Dependence Explicit Data Dependence (cntd…)(cntd…)
Explicitly Data Dependent approach includeExplicitly Data Dependent approach include Dynamic Classifier Selection (DCS)Dynamic Classifier Selection (DCS)
DCS With local Accuracy (DCS_LA) DCS With local Accuracy (DCS_LA) [Woods et. al.,97][Woods et. al.,97]
DCS based on Multiple Classifier Behavior (DCS_MCB) DCS based on Multiple Classifier Behavior (DCS_MCB) [Giancinto and Roli, 01][Giancinto and Roli, 01]
Hierarchal Mixture of ExpertsHierarchal Mixture of Experts [Jordon and Jacobs 94][Jordon and Jacobs 94]
Feature-based approach Feature-based approach [Wanas et. al., 99][Wanas et. al., 99]
Weights demonstrate dependence on the input Weights demonstrate dependence on the input pattern.pattern.
Intuitively will perform better than other methodsIntuitively will perform better than other methods
Explicit
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Feature Based ArchitecturesFeature Based Architectures
Methodology to incorporate multiple classifiers in Methodology to incorporate multiple classifiers in a dynamically adapting systema dynamically adapting system
Aggregation adapts to the behavior of the Aggregation adapts to the behavior of the ensembleensemble
Detectors generate weights for each classifier that Detectors generate weights for each classifier that reflect the degree of confidence in each classifier for a reflect the degree of confidence in each classifier for a given inputgiven input
A trained aggregation learns to combine the different A trained aggregation learns to combine the different decisionsdecisions
Feature Based
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Feature Based ArchitecturesFeature Based Architectures (cntd…)(cntd…)
Architecture IArchitecture I
Classifier 1
Classifier 2
Classifier N
FusionClassifier
INPUT
FINALDECISION
Detector
Feature Based
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Feature Based ArchitecturesFeature Based Architectures (cntd…)(cntd…)
ClassifiersClassifiers Each individual classifier, CEach individual classifier, C ii, produces some output , produces some output
representing its interpretation of the input representing its interpretation of the input xx Utilizing sub-optimal classifiers.Utilizing sub-optimal classifiers. The collection of classifier outputs for class The collection of classifier outputs for class yyjj is is
represented as represented as CCjj(x)(x)
DetectorDetector Detector Detector DDl l is a classifier that uses input features to is a classifier that uses input features to
extract useful information for aggregationextract useful information for aggregation Doesn’t aim to solve the classification problem.Doesn’t aim to solve the classification problem. Detector output Detector output ddlglg(x)(x) is a probablilty that the input is a probablilty that the input
pattern x is categorized to group pattern x is categorized to group gg.. The output of all the detectors is represented by The output of all the detectors is represented by D(X)D(X)
Feature Based
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Feature Based ArchitecturesFeature Based Architectures (cntd…)(cntd…)
AggregationAggregation Fusion layer for all the classifiersFusion layer for all the classifiers Trained to adapt to the behavior of the Trained to adapt to the behavior of the
various modulesvarious modules Explicit data dependentExplicit data dependent
Weights dependent on the input pattern Weights dependent on the input pattern being classifiedbeing classified
Feature Based
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Feature Based ArchitecturesFeature Based Architectures (cntd…)(cntd…)
Architecture IIArchitecture II
Classifier 1
Classifier 2
Classifier N
FusionClassifier
INPUT
FINALDECISION
Detector
Feature Based
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Feature Based ArchitecturesFeature Based Architectures (cntd…)(cntd…)
ClassifiersClassifiers Each individual classifier, CEach individual classifier, C ii, produces some output , produces some output
representing its interpretation of the input representing its interpretation of the input xx Utilizing sub-optimal classifiers.Utilizing sub-optimal classifiers. The collection of classifier outputs for class The collection of classifier outputs for class yyjj is is
represented as represented as CCjj(x)(x)
DetectorDetector Appends input to output of classifier ensemble.Appends input to output of classifier ensemble. Produces a weighting factor, Produces a weighting factor, wwijij ,for each class in a ,for each class in a
classifier output.classifier output. The dependence of the weights on both the classifier The dependence of the weights on both the classifier
output and the input pattern is represented by output and the input pattern is represented by W(x,CW(x,Cjj (x)) (x))
Feature Based
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Feature Based ArchitecturesFeature Based Architectures (cntd…)(cntd…)
AggregationAggregation Fusion layer for all the classifiersFusion layer for all the classifiers Trained to adapt to the behavior of the various Trained to adapt to the behavior of the various
modulesmodules Combines implicit and explicit data dependenceCombines implicit and explicit data dependence
Weights dependent on the input pattern and the Weights dependent on the input pattern and the performance of the classifiers.performance of the classifiers.
Feature Based
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
ResultsResults
Five one-hidden layer BP classifiersFive one-hidden layer BP classifiers
Training used partially disjoint data setsTraining used partially disjoint data sets
No optimization is performed for the trained No optimization is performed for the trained networksnetworks
The parameters of all the networks are The parameters of all the networks are maintained for all the classifiers that are trainedmaintained for all the classifiers that are trained
Three data setsThree data sets 20 Class Gaussian20 Class Gaussian SatimagesSatimages Clouds dataClouds data
Results
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Results Results (cntd…)(cntd…)
Data SetData Set 20 Class20 Class CloudsClouds SatimagesSatimages
SinglenetSinglenet 13.82 13.82 1.16 1.16 10.92 10.92 0.08 0.08 14.06 14.06 1.33 1.33
OracleOracle 7.29 7.29 1.06 1.06 7.41 7.41 0.16 0.16 7.20 7.20 0.36 0.36
Data Dependent ApproachesData Dependent Approaches
MaximumMaximum 12.92 12.92 0.35 0.35 10.68 10.68 0.04 0.04 13.61 13.61 0.21 0.21
MajorityMajority 13.13 13.13 0.36 0.36 10.71 10.71 0.02 0.02 13.40 13.40 0.16 0.16
AverageAverage 12.83 12.83 0.26 0.26 10.66 10.66 0.04 0.04 13.23 13.23 0.22 0.22
BordaBorda 13.04 13.04 0.30 0.30 10.71 10.71 0.02 0.02 13.77 13.77 0.20 0.20
Implicitly Data Dependent ApproachesImplicitly Data Dependent Approaches
Weighted Avg.Weighted Avg. 12.57 12.57 0.20 0.20 10.59 10.59 0.05 0.05 13.14 13.14 0.21 0.21
BayesianBayesian 12.48 12.48 0.21 0.21 10.71 10.71 0.02 0.02 13.51 13.51 0.16 0.16
Fuzzy IntegralFuzzy Integral 12.95 12.95 0.34 0.34 10.67 10.67 0.05 0.05 13.71 13.71 0.19 0.19
Explicit Data DependentExplicit Data Dependent
Feature-basedFeature-based 8.64 8.64 0.60 0.60 10.28 10.28 0.10 0.10 12.48 12.48 0.19 0.19
Results
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
TrainingTraining
Training each component independentlyTraining each component independently Optimize individual components, may not Optimize individual components, may not
lead to overall improvementlead to overall improvement Collinearity, high correlation between Collinearity, high correlation between
classifiersclassifiers Components, under-trained or over-trainedComponents, under-trained or over-trained
Training
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Training Training (cntd…)(cntd…)
Adaptive trainingAdaptive training Selective: Selective: Reducing correlation between Reducing correlation between
componentscomponents Focused: Focused: Re-training focuses on misclassified Re-training focuses on misclassified
patterns.patterns. Efficient: Efficient: Determined the duration of trainingDetermined the duration of training
Training
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Adaptive Training: Main loopAdaptive Training: Main loop
Increase diversity among Increase diversity among ensembleensemble
Incremental learningIncremental learning
Evaluation of training to Evaluation of training to determine the re-training setdetermine the re-training set
Initialize
DONE = TRUE
Train
Evaluate andcomposetraining
END
START
YES
NO
Training
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Adaptive Training: TrainingAdaptive Training: Training
Save classifier if it performs Save classifier if it performs well on the evaluation setwell on the evaluation set
Determine when to Determine when to terminate training for each terminate training for each modulemodule
i k
DONEi = TRUE
Train CiEvaluate Ci
CFi > CFi_best
Save CiCFi_best = CFi
CFi-CFi-1 <
DONEi=TRUE
i=i+1
No
YES
YES
YES
NO
No
NO
Yes
Training
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Adaptive Training: EvaluationAdaptive Training: Evaluation
Train aggregation modulesTrain aggregation modules
Evaluate training sets for Evaluate training sets for each classifiereach classifier
Compose new training dataCompose new training data i kEvaluate
System onTraini
DONEi=TRUE i
DONE=TRUE
TrainAggregation
Select newTraining data
YES
NO
YES
Training
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Adaptive Training: Data SelectionAdaptive Training: Data Selection
New training data are composed by New training data are composed by concatenatingconcatenating
ErrorErrorii: Misclassified entries of training : Misclassified entries of training
data for classifier data for classifier i.i. CorrectCorrectii: Random choice of : Random choice of R*(P*R*(P*δδ_i)_i)
correctly classified entries of the correctly classified entries of the training data for classifier training data for classifier i.i.
Training
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
ResultsResults
Five one-hidden layer BP classifiersFive one-hidden layer BP classifiers
Training used partially disjoint data setsTraining used partially disjoint data sets
No optimization is performed for the trained No optimization is performed for the trained networksnetworks
The parameters of all the networks are The parameters of all the networks are maintained for all the classifiers that are trainedmaintained for all the classifiers that are trained
Three data setsThree data sets 20 Class Gaussian20 Class Gaussian SatimagesSatimages Clouds dataClouds data
Results
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Results Results (cntd…)(cntd…)
Data SetData Set 20 Class20 Class CloudsClouds SatimagesSatimages
SinglenetSinglenet 13.82 13.82 1.16 1.16 10.92 10.92 0.08 0.08 14.06 14.06 1.33 1.33
Normal TrainingNormal Training
Best ClassifierBest Classifier 14.03 14.03 0.64 0.64 11.00 11.00 0.09 0.09 14.72 14.72 0.43 0.43
OracleOracle 7.29 7.29 1.06 1.06 7.41 7.41 0.16 0.16 7.20 7.20 0.36 0.36
Feature BasedFeature Based 8.64 8.64 0.60 0.60 10.28 10.28 0.10 0.10 12.48 12.48 0.19 0.19
Ensemble Trained Adaptively using WA as the evaluation functionEnsemble Trained Adaptively using WA as the evaluation function
Best ClassifierBest Classifier 14.75 14.75 1.06 1.06 12.03 12.03 0.52 0.52 17.13 17.13 1.03 1.03
OracleOracle 6.79 6.79 2.30 2.30 5.73 5.73 0.11 0.11 5.58 5.58 0.17 0.17
Feature BasedFeature Based 8.62 8.62 0.25 0.25 10.24 10.24 0.17 0.17 12.40 12.40 0.12 0.12
Feature Based Architecture Trained AdaptivelyFeature Based Architecture Trained Adaptively
Best ClassifierBest Classifier 14.80 14.80 1.32 1.32 11.97 11.97 0.59 0.59 16.96 16.96 0.87 0.87
OracleOracle 5.42 5.42 1.30 1.30 5.43 5.43 0.11 0.11 5.48 5.48 0.18 0.18
Feature BasedFeature Based 8.01 8.01 0.19 0.19 10.06 10.06 0.13 0.13 12.33 12.33 0.14 0.14
Results
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
ConclusionsConclusions
Categorization of various combining Categorization of various combining approaches based on data dependenceapproaches based on data dependence
Independent : vulnerable to incorrect Independent : vulnerable to incorrect confidence estimatesconfidence estimates
implicitly dependent: doesn’t take into implicitly dependent: doesn’t take into account local superiority of classifiersaccount local superiority of classifiers
Explicitly dependent: Literature focuses on Explicitly dependent: Literature focuses on selection not combiningselection not combining
Conclusions
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
Conclusions Conclusions (cntd…)(cntd…)
Feature-based approachFeature-based approach Combines implicit and explicit data dependenceCombines implicit and explicit data dependence Uses an Evolving training algorithm to enhance Uses an Evolving training algorithm to enhance
diversity amongst classifiersdiversity amongst classifiers Reduces harmful correlationReduces harmful correlation Determines duration of trainingDetermines duration of training Improved classification accuracyImproved classification accuracy
Conclusions
MCS 2003MCS 2003 Data Dependence in Combining ClassifiersData Dependence in Combining Classifiers
IntroductionData Dependence
ImplicitExplicit
Feature BasedTraining
ResultsConclusions
ReferencesReferences[Kittler et. al., 98] J. Kittler, M. Hatef, R. Duin, and J. Matas, “On Combining Classifiers”, IEEE Trans. [Kittler et. al., 98] J. Kittler, M. Hatef, R. Duin, and J. Matas, “On Combining Classifiers”, IEEE Trans.
PAMI, 20:3, 226-239, 1998.PAMI, 20:3, 226-239, 1998.[Dasarthy, 94] B. Dasarthy, “Decision Fusion”, IEEE Computer Soc. Press, 1994.[Dasarthy, 94] B. Dasarthy, “Decision Fusion”, IEEE Computer Soc. Press, 1994.[Hashem, 1997] S. Hashem, “Algorthims for Optimal Linear Combination of Neural Networks” Int. Conf. [Hashem, 1997] S. Hashem, “Algorthims for Optimal Linear Combination of Neural Networks” Int. Conf.
on Neural Networks, Vol 1, 242-247, 1997.on Neural Networks, Vol 1, 242-247, 1997.[Jordon and Jacob, 94] M. Jordon, and R. Jacobs, “Hierarchical Mixture of Experts and the EM [Jordon and Jacob, 94] M. Jordon, and R. Jacobs, “Hierarchical Mixture of Experts and the EM
Algorithm”, Neural Computing, 181-214, 1994.Algorithm”, Neural Computing, 181-214, 1994.[Wolpert, 92] D. Wolpert, “Stacked Generalization”, Neural Networks, Vol 5, 241-259, 1992[Wolpert, 92] D. Wolpert, “Stacked Generalization”, Neural Networks, Vol 5, 241-259, 1992[Auda and Kamel, 98] [Auda and Kamel, 98] G. Auda and M. Kamel, “Modular Neural Network Classifiers: A Comparative
Study”, J. Int. Rob. Sys., Vol. 21, 117–129, 1998.[Gader et. al., 96] [Gader et. al., 96] P. Gader, M. Mohamed, and J. Keller, “Fusion of Handwritten Word Classifiers”, Patt.
Reco. Let.,17(6), 577–584, 1996.[Xu et. al., 92] L. Xu, A. Kazyzak, C. Suen, “Methods of Combining Multiple Classifiers and their [Xu et. al., 92] L. Xu, A. Kazyzak, C. Suen, “Methods of Combining Multiple Classifiers and their
Applications to Handwritten Recognition”, IEEE Sys. Man and Cyb., 22(3), 418-435, 1992Applications to Handwritten Recognition”, IEEE Sys. Man and Cyb., 22(3), 418-435, 1992[Kuncheva et. al., 01] [Kuncheva et. al., 01] L. Kuncheva, J. Bezdek, and R. Duin, “Decsion Templates for Multiple Classifier
Fusion: An Experimental Comparison”, Patt. Reco., vol. 34, 299–314, 2001.[Huang et. al., 95] [Huang et. al., 95] Y. Huang, K. Liu, and C. Suen, “The Combination of Multiple Classifiers by a Neural
Network Approach”, J. Patt. Reco. and Art. Int., Vol. 9, 579–597, 1995.[Schapire, 90] [Schapire, 90] R. Schapire, “The Strength of Weak Learnability”, Mach. Lear., Vol. 5, 197–227,1990.[Giancinto and Roli, 01] G. Giancinto and F. Roli, “Dynamic Classifier Selection based on Multiple
Classifier Behavior”, Patt. Reco., Vol. 34, 1879-1881, 2001.[Wanas et., al., 99] N. Wanas, M. Kamel, G. Auda, and F. Karray, “Feature Based Decision Aggregation
in Modular Neural Network Classifiers”, Patt. Reco. Lett., 20(11-13), 1353-1359, 1999.
top related