efficient process for constructing a hierarchical classification system
DESCRIPTION
Efficient Process for Constructing a Hierarchical Classification System. Yong-wook Yoon Dec 22, 2003 NLP Lab., POSTECH. Contents. Introduction Related Work Measure for Hierarchical Classifier Hierarchical Classification Demo Experiment Contribution Future Work. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
Efficient Process for Constructing a Hierarchical Classification System
Yong-wook YoonDec 22, 2003
NLP Lab., POSTECH
2
Contents
Introduction Related Work Measure for Hierarchical Classifier Hierarchical Classification Demo Experiment Contribution Future Work
3
Introduction New Trends in text categorization
Massive amount of documents produced everyday Often require On-line classification
Flat vs. Hierarchical Classification Hierarchical method is feasible in a large amount of
documents that have many levels of hierarchy Advantages of Hierarchical Classification
Well fit to a large number of categories Efficient in training time Better performance than flat classifier
4
Issues in Hierarchical Classification
There are no appropriate measure To evaluate the performance of a Hierarchical Classifier
There are no systematic process To construc a large level of Hierarchical classification syst
em
Our Suggestions New evaluation scheme that is well fit to Hierarchical Clas
sification system Efficient process to construct an optimal hierarchical clas
sification system
5
Flat vs. Hierarchical Classification
Root
C1 C2 C3 Cn
Business
Grain Oil
C1 C2 Ci Cj Cj+1 Cn
6
Variations in Hierarchical Classification
Virtual Category tree vs. Category tree Categories are organized as trees (cf. DAG) Documents can be assigned to
Leaf categories only (cf. Category tree)
Two methods in Hierarchical classification Big-Bang Approach
By only one classification A document is assigned to a leaf-node class or an internal-node class
Top-down level-based Approach A classifiier at each node of a hierarchy tree A document is classified by
Applying a sequence of classifiers from the root node to a leaf node
7
Virtual Category Tree withTop-down level-based Classification
Root
Comp. Talk.Alt.athesim
Class_1 Classifier Classifier Class_N
Doc At root node, there exists k classifiers, where k is the # of child nodes
yes yesno
Each classifier determine whether to descend the document to the lower level according to the sign of SVM score → called ‘Pachinko-Machine’
Finally, at the leaf nodes the correctness of prediction is examined
Class_i
8
Contents
IntroductionRelated Work Measure for Hierarchical Classifier Hierarchical Classification Demo Experiment Contribution Future Work
9
Previous Evaluations in Hierarchical Classification
Dumais and Chen (SIGIR-2000) Traditional Precision and Recall of each leaf-node classifi
er The probability of leaf-node classifier
L1: internal-node, L2: leaf-node classifier Boolean Scoring Function: P(L1)&&P(L2) Multiplicative Scoring Function: P(L1)*P(L2)
Limitations in Dumais and Chen May be feasible to simple cases such as 2-levels hie
rarchy (But large hierarchy?) No concern about the internal node performance
10
Previous Evaluations in Herarchical Classification (2)
Aixin Sun et al (JASIST’03) “Expanded Precision and Recall”
Considering category similarity and Contributions of misclassified documents
Limitations Difficult to compare with flat methods directly Too complex to calculate No concern about internal node performance
11
SVM in text categorization
Firstly suggested Joachims(1997) Shows superiority of SVMs over others
with experiments on Reuters-21578 (flat method) Theoretical learning model in TC (SiGiR’01)
SVM with Hierarchical method Dumais and Chen (SiGiR’00)
LookSmart Web directory (www.looksmart.com) 17173 categories organized into 7-level hierarchy
Tao Li et al (SiGiR’03) 20-Newsgroups, Optimally clustered 2-level hierarchy Only measure accuracy of a classifier
12
Contents
Introduction Related WorkMeasure for Hierarchical
Classifier Hierarchical Classification Demo Experiment Contribution Future Work
13
New evaluation on Hierarchical classification
Intermediate Precision and Recall For Internal node Classifier Selecting the classifier with the optimal
performance at intermediate level Approximate Precision and Recall
Performance of entire system in the middle of construction process
Overall P and R of Hierarchical System Applicable to Hierarchical Classifier Compatible to the traditional P and R of flat
classification
14
Evaluation of multi-labeled hierarchical classification
Given 4 category, 10 documents for test # of predictions: 4 x 10 = 40
A B C D
A BC D
B C
A - - TNB + - FNC + + TPD - + FP
A - - TNBC + + TPD - + FP
B + - FNC + + TP
Delayed Evaluation
A - - TNBC + - FND - + FP
B + - FNC + - FN
Pre-expanded Evaluation
A - - TNB + - FNC + + TPD - + FP
Ac Pr
Ac Pr Ac Pr
Ac: Actual classPr: Predicted class
Doc_1: Doc_2:Doc_1:
Doc_2:
15
Intermediate Recall of an Internal Classifier
jjj
jj
jj
jj
NLCFNTP
TPR
FPTP
TPP
,
NLCj is the weighting factor The number of all leaf node classifiers that are descenda
nts of node j Reasonable in micro-averaged evaluation
16
Business
Grain Oil
C1 C2 CiCj Cj+1 Cn
Meat
PorkC2
Cn
Internal node classifier vs.Leaf node classifier
Ex) NLCj (Meat) = 5
17
Approximate Precision and Recall at the level-k
Where TPi: # of true positive at leaf classifier I TPj,FPj, and FNj: at the internal classifier j
,)()(
kk
kk
ICjjj
LCiii
ICjj
LCii
k FPTPFPTP
TPTP
P
1
)()(kkk
kk
ICmm
ICjjj
LCiii
ICjj
LCii
k WFNWFNTPFNTP
TPTP
R
mmm
jjj
NLCFNWFN
NLCFNWFN
,
18
Overall Recall in HTC, Rh
Definition
where TPi: # of true positive at leaf classifier i FNj: # of false negative at leaf classifier j WFNk: weighted FNk at internal classifier k
nodesInternalkk
nodesLeafjj
m
i i
m
i ihh
WFNFNTP
TPRPP
1
1,
kclassifierofsclassifierleaflowerallofnumbertheisNLCwhere
NLCFNWFN
k
kleafkk ,_
19
Contents
Introduction Related Work Measure for Hierarchical Classifier Hierarchical Classification Demo Experiment Contribution Future Work
20
Contents
Introduction Related Work Measure for Hierarchical Classifier Hierarchical Classification DemoExperiment Contribution Future Work
21
20 newsgroup dataset
Usenet news article collection 19,997 documents in 20 newsgroups Each document consists of two parts: header an
d body Able to consider the intrinsic hierarchy 4.5 % of the articles have been posted to more t
han one newsgroup The cause of ‘Multi-classes’ ‘alt.atheism’ and ‘talk.religion.misc’
22
root
alt comp misc rec sci soc talk
graphicsos sys windows
forsale
autosmotor-cycles
sport
religion
cryptelectronics med
christian
space
ibm mac
hardwarehardware
baseball hocky politics religion
guns mideast misc misc
atheism
xms-windows
misc
총 8 개의 classifier 가 필요
23
Classification Result
20 Newsgroups in Three-level tree
BEP Accuracy
flatbaseline 75.9 89.2
Sigir-01 88.6 91.0
hier
withoutevaluation 86.0 90.1
Withevaluation 89.0 94.3
Sigir-03 - 96.3
24
Selection of Optimal Internal Classifier
using Intermediate P and RCOST TN FP WFN WTN P Rj BEP
comp.
700 1241 127 80 18080 90.7 93.9 92.3
500 1242 125 75 18090 90.9 94.3 92.6
100 1244 108 65 18175 92.0 95.0 93.5
70 1243 105 70 18190 92.2 94.7 93.4
50 1245 107 60 18180 92.1 95.4 93.7
30 1242 104 75 18195 92.3 94.3 93.3
sci.
300 985 230 80 15060 81.1 92.5 86.8
200 985 195 80 15200 83.5 92.5 88.0
150 983 177 88 15272 84.7 91.8 88.3
100 981 158 96 15348 86.1 91.1 88.6
80 977 136 112 15436 87.8 89.7 88.7
50 967 96 152 15596 91.0 86.4 88.7
25
Approximate P and R
For Three-level Hierarchy
Result of Another Hierarchies 2-level shows better performance than 3-level Clustered hierarchy is comparable to ours
Level-k TP FP FN Pk Rk BEP
0 5043 543 299 90.3 94.4 92.3
1 4884 594 579 89.2 89.4 89.3
2 4817 485 713 90.9 87.1 89.0
Tree types TP FP accFN Ph Rh BEP
Two-level 4820 481 708 90.9 87.1 89.1
Clustered 4845 529 719 87.1 90.2 88.6
26
Contents
Introduction Related Work Measure for Hierarchical Classifier Hierarchical Classification Demo ExperimentContribution Future Work
27
Contribution Evaluation measure for a Hierarchical Classification
system Final performance in terms of P and R Fully compatible the previous measure Makes possible to compare the performances
Between flat and hierarchical classifier Between different hierarchical classifiers
Algorithm to efficiently construct a hierarchical classification system With good performance Intermediate evaluation in the middle of construction process Maintaining original benefit of hierarchical method
Training time saving Good performance of SVM Easily applicable in the on-line execution environment
28
Future Work Further Research is required
The appropriate number of sub classes 2-level performs better than 3-level
The criterion in the selection of optimal internal node classifier Recall or BEP or Interpolation ?
실험대상 문서집합의 확대 Reuter News articles WEB KB 실제 Web 문서
The End
감사합니다 .
30
Approximate Precision and Recall
Another helpful measures to construct a hierarchical classification system
Given K height of category tree, Compute the Approximate P and R at the
level-k Helpful to recognize
How close the approximate performance is to the final performance of entire system
31
Selection Criteria of Optimal Internal Classifier
SVM cost(upper node)
80 150
TP FP FN TN P R BEP TP FP FN TN P R BEP
sci.crypt 237 8 10 858 96.7 96.0 96.3 238 8 10 904 96.7 96.0 96.4
sci.electronics 219 48 23 855 82.0 90.5 86.3 224 53 21 862 80.9 91.4 86.2
sci.med 232 9 11 782 96.3 95.5 95.9 234 9 11 906 96.3 95.5 95.9
sci.space 240 19 6 848 92.7 97.6 95.1 240 23 6 891 91.3 97.6 94.4
total 928 84 50 3390 91.7 94.9 93.3 936 93 48 3563 91.0 95.1 93.0
Combined total 162 85.1 88.4 136 87.3 89.2
The performance of cost 150 is Superior to the one of cost 80 !
COST TN FP WFN WTN P Rj BEP
sci.
300 985 230 80 15060 81.1 92.5 86.8
200 985 195 80 15200 83.5 92.5 88.0
150 983 177 88 15272 84.7 91.8 88.3
100 981 158 96 15348 86.1 91.1 88.6
80 977 136 112 15436 87.8 89.7 88.7
50 967 96 152 15596 91.0 86.4 88.7
32
root
5 1 6 2 3 7 4
talkalt talk
religion
atheism
Clustered 2-level Hierarchy
misc
mideastguns
politics
misc
sci
electronicsspace
med
comp
.graphics
.os.ms-windows.misc
.sys.ibm.pc.hardware
.sys.mac.hardware
.windows.x
rec
motor-cyclessport.baseball
sport.hockey
misc
christian
soc.religion
8
forsale
sci
crypt
rec
autos
33
Support Vector Machine Widely used in Text Categorization Recently
Shows good performance in classification tasks with large amount of data and high dimension
SVM training involves solving a quadratic programming (αi, b) The optimal solution gives rise to a decision function whic
h we use in prediction phase Given l data points {(x1,y1), … , (xl, yl)}
l
iiii byf
1
)(sgn)( xxx
34
Focus in Our Paper
Suggestion of New Measuring Scheme Well fit to Hierarchical Classification
Efficient construction of Hierarchical Classifier Compatible to the previous measures
Enables easy comparison between flat and hierarchical classifier
Efficient Hierarchical Classification model Virtual Category Structure + SVM Evaluation by Intermediate Precision and Recall