1 building hierarchical classifiers using class proximity ke wang senqiang zhou shiang chen liew...

21
1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

Post on 21-Dec-2015

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

1

Building Hierarchical Classifiers Using Class Proximity

Ke Wang

Senqiang Zhou

Shiang Chen Liew

National University of Singapore

Page 2: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

2

Hierarchical classification• Given

– a class hierarchy– a collection of pre-classified documents – a document is a set of terms

• Build– a classifier that assigns a relevant class to a new

document

• Key– extract features of classes

Page 3: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

3

Yahoo classes

Yahoo

recreation science

automotivesports

skatingcycling

Page 4: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

4

ACM classes

Hardware

General Memory_structure

General Design_style

Cache_memories

Level 1

Level 2

Level 3

Level 4

Page 5: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

5

Existing local approaches

• build one classifier at each split of the

class hierarchy

• determine features locally at each node

• classify a document by going through a

path of classifiers starting from the root

Page 6: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

6

Diminishing of high level structure

• rely on classification at high levels

• but high level structures usually weak, i.e., divergence of topics

• e.g., “car” is a feature at Recreation: Automotive, but not at Recreation

Page 7: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

7

Bias of misclassification

• sibling classes Vs. nephew classes

• misclassification at high levels Vs. at low levels

• specialisation Vs. generalisation

Page 8: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

8

Features should be

• determined wrt the target class

• determined at all concept levels

• correlated

The solution: generalised association rules (SA95, HF95)

{sql, IO} DB

{language, performance} CS

Page 9: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

9

Our approach

• class proximity

• global classifier

• term hierarchy

• use the “best” generalised association rule

T C to determine the class

Page 10: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

10

Rank association rules

• Biased confidence

• Biased J-measure

Page 11: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

11

An example

author story

writer editor fiction poem

Music Literature

A_Music A_Literature

Arts

Term hierarchy

Class hierarchy

... ...

Page 12: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

12

Term hierarchy(T)=YesClass proximity(B)=Yes

• R0: author,storyLiterature (ConfB=1,Clist=d6,d7)

• R1: authorLiterature (ConfB=1)

• R2: storyLiterature (ConfB=0.67, Wlist=d5(1))

• R4: hallMusic (ConfB=0.4, Clist=d1,d2, Wlist=d3(1))

• R3: StatesA_Literature (ConfB=0.33, Clist=d4,d5)

Page 13: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

13

Page 14: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

14

Experiment I

• http://www.acm.org/dl/toc.html/

• 26,515 papers, 78 classes, 14,754 terms

• class hierarchy=Level-1 and level-2 categories

• term hierarchy=Level-3 and level-4 categories

• document=Title and level-4 categories

Page 15: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

15

Best rules found by (B,T)• CSO:

– vector,stream,processor,parallelProcessor_Architectures– multiple_instruction_streamProcessor_Architectures– data_flow,architecturProcessor_Architectures– internet, architecturComputer_Communication_Networks – mode,atmComputer_Communication_Networks – network,circuit_switching Computer_Communication_Networks – tecniqu, model, attributPerformance_of_Systems

• Software:– program,function, applicationProgramming_Techniques– object_oriented_programmingProgramming_Techniques– reusable_softwareSoftware_Engineering– software,methodologieSoftware_Engineering– organization, distributed_systemOperating_Systems

Page 16: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

16

() --- | (T) --- (B) --- (B,T) --- (CDAR97,T) --- (CDAR97) ---

Page 17: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

17

Page 18: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

18

Experiment II

• http://dir.yahoo.com/recreation/sports

• 7,550 documents

• 367 classes, 7 levels

• 10,747 terms

• 90% of the terms occur in no more than 10 documents and many documents contain only such terms

Page 19: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

19

Best rules found by (B,T)• Sports:Cycling:

– page,mountain Mountain_Biking– product,bikeMountain_Biking– mtb,mountain Mountain_Biking– held,bicyclRaces– classic,bicyclRaces– trip,tourTravelogues– trip,canada Travelogues– bicycl,alaskaTravelogues

• Sports:Auto_Racing:– team,result,driverFormula_one– model,featurTracks_and_Speedways– ovalTracks_and_Speedways– racewayTracks_and_Speedways

Page 20: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

20

Page 21: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

VLDB99, Sept 6-10, Edinburgh

21