![Page 1: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/1.jpg)
A Similarity Evaluation Technique for Data Mining with Ensemble of
Classifiers
Seppo Puuronen, Vagan Terziyan
International Workshop on Similarity Search
1-2 September, 1999Florence (Italy)
![Page 2: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/2.jpg)
Authors
Department of Computer Science and Information Systems
University of Jyvaskyla FINLAND
Seppo Puuronen
Vagan Terziyan
Department of Artificial Intelligence
Kharkov State Technical University of Radioelectronics,
UKRAINE
![Page 3: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/3.jpg)
Contents
The Research Problem and Goal Basic Concepts External Similarity Evaluation Evaluation of Classifiers Competence An Example Internal Similarity Evaluation Conclusions
![Page 4: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/4.jpg)
The Research Problem
During the past several years, in a variety of application domains, researchers in machine learning, computational learning theory, pattern recognition and statistics have tried to combine
efforts to learn how to create and combine an ensemble of classifiers.
The primary goal of combining several classifiers is to obtain a more accurate prediction than can be obtained from any single classifier alone.
![Page 5: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/5.jpg)
Goal
The goal of this research is to develop simple similarity evaluation technique to be used for classification problem based on an ensemble of classifiers
Classification here is finding of an appropriate class among available ones for certain instance based on classifications produced by an ensemble of classifiers
![Page 6: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/6.jpg)
Basic Concepts:Training Set (TS)
TS of an ensemble of classifiers is a quadruple:
<D,C,S,P>• D is the set of instances D1, D2,..., Dn to be classified;
• C is the set of classes C1, C2,..., Cm , that are used to classify the instances;
• S is the set of classifiers S1, S2,..., Sr , which select classes to classify the instances;
• P is the set of semantic predicates that define relationships between D, C, S
![Page 7: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/7.jpg)
Basic Concepts:Semantic Predicate P
P D C S
if the c S uses c C
to c i D
if S refuses to use C
to c D
if S does not use or refuse
to use C to c D
i j k
k j
i
k j
i
k
j i
( , , )
,
;
,
;
,
lassifier lass
lassify nstance
lassify
lassify .
1
1
0
![Page 8: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/8.jpg)
Problem 1:Deriving External Similarity Values
DC
S
DiCj
Sk
SDk,i
DCi,j
SCk,j
Instances Classes
Classifiers
![Page 9: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/9.jpg)
External Similarity Values
DC
S
DiCj
Sk
SDk,i
DCi,j
SCk,j
External Similarity Values (ESV): binary relations DC, SC, and SD between the elements of (sub)sets of D and C; S and C; and S and D.
ESV are based on total support among all the classifiers for voting for the appropriate classification (or refusal to vote)
![Page 10: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/10.jpg)
Problem 2:Deriving Internal Similarity Values
D C
S
Di’
SSk’,k’’
DDi’,i’’ CCj’,j’’
Di’’
Cj’
Cj’’
Sk’
Sk’’
Instances Classes
Classifiers
![Page 11: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/11.jpg)
Internal Similarity Values
D C
S
Di’
SSk’,k’’
DDi’,i’’ CCj’,j’’
Di’’
Cj’
Cj’’
Sk’
Sk’’
Internal Similarity Values (ISV): binary relations between two subsets of D, two subsets of C and two subsets of S.
ISV are based on total support among all the classifiers for voting for the appropriate connection (or refusal to vote)
![Page 12: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/12.jpg)
Why we Need Similarity Values (or Distance Measure) ? Distance between instances is used by agents to
recognize nearest neighbors for any classified instance
distance between classes is necessary to define the misclassification error during the learning phase
distance between classifiers is useful to evaluate weights of all classifiers to be able to integrate them by weighted voting
![Page 13: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/13.jpg)
Deriving External Relation DC:How well class fits the instance
DC CD P D C S D D C Ci j j i i j k i jk
r
, , ( , , ), ,
DC
S
DiCj
Sk2
DCi,j=3
Sk1
Sk3
Classifiers
Instances Classes
![Page 14: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/14.jpg)
Deriving External Relation SC: Measures Classifiers Competence in the Area of Classes
The value of the relation (Sk,Cj) in a way represents the total support that the classifier Sk obtains selecting (refusing to select) the class Cj to classify all the instances.
SC CS DC P D C S S S C Ck j j k i j i j ki
n
k j, , , ( , , ), ,
![Page 15: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/15.jpg)
Example of SC Relation
Classifiers
Instances Classes
![Page 16: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/16.jpg)
Deriving External Relation SD: Measures “Competence” of Classifiers in the Area of Instances
The value of the relation (Sk,Di) represents the total support that the classifier Sk receives selecting (or refusing to select) all the classes to classify the instance Di.
SD DS DC P D C S S S D Dk i i k i j i j kj
m
k i, , , ( , , ), ,
![Page 17: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/17.jpg)
Example of SD Relation
DC
SSk
Di
C1
SDk,i=2
C2
CD1i = -3
CD2i = 5
InstancesClasses
Classifiers
![Page 18: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/18.jpg)
Standardizing External Relations to the Interval [0,1]
standardizing value value =value value
max(value) - min(value)
-min( )
DC CDDC r
ri j j ii j
, ,,
2
SC CSSC n r
n rk j j kk j
, ,, ( )
( )
2
2 1
SD DSSD m r
m rk i i kk i
, ,, ( )
( )
2
2 1
n is the number of instances
m is the number of classes
r is the number of classifiers
![Page 19: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/19.jpg)
Competence of a Classifier
Di
Conceptual pattern of features
Conceptual pattern of class definition
Instances Classes
Cj
Classifier
Competence in the Instance Area
Competence in the Area of Classes
![Page 20: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/20.jpg)
Classifier’s Evaluation:Competence Quality in an Instance Area
Q Sn
SDDk k i
i
n( ) , 1
- measure of the “classification abilities” of a classifier relatively to instances from the support point of view
![Page 21: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/21.jpg)
Agent’s Evaluation:Competence Quality in the Area of Classes
- measure of the “classification abilities” of a classifier in the correct use of classes from the support point of view
Q Sm
SCCk k j
j
m( ) , 1
![Page 22: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/22.jpg)
Quality Balance Theorem
Q S Q SDk
Ck( ) ( )
The evaluation of a classifier’s competence (ranking, weighting, quality evaluation) does not depend on the competence area “real world of instances” or “conceptual world of classes” because both competence values are always equal
![Page 23: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/23.jpg)
Proof
Q Sn
SDn
SD m r
m rD
k k ii
nk i
i
n( )
( )
( ),,
1 1 2
2 1
1
2
2 1n
DC P D C S m r
m r
i j i j kj
m
i
n( ( , , )) ( )
( )
,
1
2
2 1m
DC P D C S n r
n r
i j i j ki
n
j
m( ( , , )) ( )
( )
,
...
...
1 2
2 1
1
m
SC n r
n r mSC Q S
k j
j
m
k jj
mC
k,
,
( )
( )( )
![Page 24: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/24.jpg)
An Example
Let us suppose that four classifiers have to classify three papers submitted to a conference with five conference topics
The classifiers should define their selection of appropriate conference topic for every paper
The final goal is to obtain a cooperative result of all the classifiers concerning the “paper - topic” relation
![Page 25: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/25.jpg)
C (classes) Set in the Example
Classes - Conference Papers Notation
AI and Intelligent Systems C1
Analytical Technique C2
Real-Time Systems C3
Virtual Reality C4
Formal Methods C5
![Page 26: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/26.jpg)
S (classifiers) Set in the Example
Classifiers - “Referees” Notation
A.B. S1
H.R. S2
M.L. S3
R.S. S4
![Page 27: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/27.jpg)
D (instances) Set in the Example
I n s t a n c e s
D 1P a p e r 1
D 2P a p e r 2
D 3P a p e r 3
![Page 28: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/28.jpg)
Selections Made for the Instance “Paper 1”
D1
P(D,C,S) C1 C2 C3 C4 C5
S1 1 -1 -1 0 -1
S2 0+ -1** 0 ++ 1* -1***
S3 0 0 -1 1 0
S4 1 -1 0 0 1Classifier H.R. considers “Paper 1” to fit to topic Virtual Reality* and refuses to include it to Analytical Technique** or Formal Methods***. H.R. does not choose or refuse to choose the AI and Intelligent Systems+ or Real-Time Systems++ topics to classify “Paper 1”.
![Page 29: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/29.jpg)
Selections Made for the Instance “Paper 2”
D2
P C1 C2 C3 C4 C5
S1 -1 0 -1 0 1
S2 1 -1 -1 0 0
S3 1 -1 0 1 1
S4 -1 0 0 1 0
![Page 30: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/30.jpg)
Selections Made for the Instance “Paper 3”
D3
P C1 C2 C3 C4 C5
S1 1 0 1 -1 0
S2 0 1 0 -1 1
S3 -1 -1 1 -1 1
S4 -1 -1 1 -1 1
![Page 31: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/31.jpg)
Result of Cooperative Paper Classification Based on DC Relation
AI and Intelligent Systems, Virtual
Reality, NOT Analytical Technique,
NOT Real-Time SystemsPaper 1
Virtual Reality, Formal Methods,NOT Analytical Technique, NOTReal-Time Systems
Paper 2
Real-Time Systems, Formal
Methods, NOT Virtual RealityPaper 3
![Page 32: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/32.jpg)
Results of Classifiers’ Competence Evaluation (based on SC and SD sets)
… Proposals obtained from the classifier A.B. should be accepted if they concern topics Real-Time Systems and Virtual Reality or instances “Paper 1” and “Paper 3”, and these proposals should be rejected if they concern AI and Intelligent Systems or “Paper 2”. In some cases it seems to be possible to accept classification proposals from the classifier A.B. if they concern Analytical Technique and Formal Methods. All four classifiers are expected to give an acceptable proposals concerning “Paper 3” and only suggestion of the classifier M.L. can be accepted if it concerns “Paper 2” ...
![Page 33: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/33.jpg)
Deriving Internal Similarity Values
Set A Set I
A’
A”
A’I
IA”
A’A”I
A’
A”
a)
Set A
Set I
A’
A”
A’I
JA”
A’A”IJ
A’
A”
b)
Set J
IJ
Via one intermediate set Via two intermediate sets
![Page 34: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/34.jpg)
Internal Similarity for Classifiers: Instance-Based Similarity
D C
SS’S’’D
S’’
S’DS’’
S’D
S S S S S S S D DSD' '' ' '' ' '',
Instances
Classifiers
![Page 35: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/35.jpg)
Internal Similarity for Classifiers: Class-Based Similarity
D C
SS’S’’C
S’’
S’
CS’’
S’C
S S S S S S S C CSC' '' ' '' ' '',
Classes
Classifiers
![Page 36: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/36.jpg)
Internal Similarity for Classifiers: Class-Instance-Based Similarity
D C
SS’S’’CD
S’’
S’DS’’S’C
CD
S S S S S S S C CD DSCD' '' ' '' ' '',
Classifiers
ClassesInstances
![Page 37: A Similarity Evaluation Technique for Data Mining with Ensemble of Classifiers Seppo Puuronen, Vagan Terziyan International Workshop on Similarity Search](https://reader031.vdocuments.us/reader031/viewer/2022032201/56649d455503460f94a21f35/html5/thumbnails/37.jpg)
Conclusion
Discussion was given to methods of deriving the total support of each binary similarity relation. This can be used, for example, to derive the most supported classification result and to evaluate the classifiers according to their competence
We also discussed relations between elements taken from the same set: instances, classes, or classifiers. This can be used, for example, to divide classifiers into groups of similar competence relatively to the instance-class environment