text classification without supervision: incorporating world...
TRANSCRIPT
![Page 1: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/1.jpg)
Text Classification without Supervision: Incorporating World Knowledge and
Domain Adaptation
Yangqiu SongLane Department of CSEE
West Virginia University
1
Much of the work was done at UIUC
![Page 2: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/2.jpg)
Collaborators
Dan Roth Haixun Wang Shusen Wang Weizhu Chen
2
![Page 3: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/3.jpg)
Text Categorization
• Traditional machine learning approach:
Label data
Train a classifier
Make prediction
3
![Page 4: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/4.jpg)
Challenges• Domain expert annotation
– Large scale problems
• Diverse domains and tasks• Topics
• Languages
• …
• Short and noisy texts
– Tweets,
– Queries,
– …4
![Page 5: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/5.jpg)
Many diverse and fast changing domains
Domain specific task:entertainment or sports?
Reduce Labeling Efforts
Semi-supervised learning
A more general way?
Transfer learningZero-shot learning
Search engineSocial media…
5
![Page 6: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/6.jpg)
Our Solution• Knowledge enabled learning
– Millions of entities and concepts
– Billions of relationships
• Labels carry a lot of information!
– Traditional models treat labels as “numbers or IDs”
6
![Page 7: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/7.jpg)
Example:Knowledge Enabled Text Classification
Dong Nguyen announced that he would be removing his hit game Flappy Bird from both the iOS and Android app stores, saying that the success of the game is something he never wanted. Some fans of the game took it personally, replying that they would either kill Nguyen or kill themselves if he followed through with his decision.
Pick a label:
Class1 or Class2 ?Mobile Game or Sports
7
![Page 8: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/8.jpg)
Dataless Text Categorization: Classification on the Fly
Documents
(Good) Labelnames Map
labels/documentsto the same space
M.-W. Chang, L.-A. Ratinov, D. Roth, V. Srikumar: Importance of Semantic Representation: Dataless Classification. AAAI 2008.Y. Song, D. Roth: On dataless hierarchical text classification. (AAAI). 2014.
Mobile Game or Sports?
World knowledge
Compute document and label similarities
Choose labels
8
![Page 9: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/9.jpg)
Challenges of Using Knowledge
Representation
Inference Learning
Data vs. knowledge representation
Knowledge specification;Disambiguation
Compare different representations
Show some interesting examples
Scalability;Domain adaptation;
Open domain classes
9
![Page 10: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/10.jpg)
Outline of the TalkDataless Text Classification:
Classify Documents on the Fly
Documents
Labelnames Map
labels/documentsto the same space
World knowledge
Compute document and label similarities
Choose labels
10
![Page 11: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/11.jpg)
Difficulty of Text Representation
• Polysemy and Synonym
Language
Meaning
Ambiguity
Variability
apple
company fruit treecat
cat feline kitty moggy
Basic level concepts
Typicality scores
Rosch, E. et al. Basic objects in natural categories. Cognitive Psychology. 1976. Rosch, E. Principles of categorization. In Rosch, E., and Lloyd, B., eds., Cognition and Categorization. 1978. 11
![Page 12: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/12.jpg)
Typicality of Entities
bird
12
![Page 13: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/13.jpg)
Basic Level Concepts
pet mammalpug dog
What do we usually call it?
animal
13
![Page 14: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/14.jpg)
pug bulldog
We use the right level of concepts to describe things!
14
![Page 15: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/15.jpg)
Probase: A Probabilistic Knowledge Base
Web Document Cleaning
Information Extraction
Semantic Cleaning
Knowledge Integration
Hearst patterns“Animals such as dogs and cats.”
1.68 billions documents
Mutual exclusive concepts
FreebaseWikipedia
M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. Int. Conf. on Comp. Ling. (COLING).1992.W. Wu, et al. Probase: A probabilistic taxonomy for text understanding. In ACM SIG on Management of Data (SIGMOD). 2012. (Data released http://probase.msra.cn) 15
![Page 16: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/16.jpg)
citycountrydisease
magazinebank
… local schoolJava toolbig bank
BI product…
Distribution of Concepts
Concept Distribution
16
![Page 17: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/17.jpg)
Typicality
• Animal • Dog0 0.05 0.1 0.15
dog
cat
horse
bird
rabbit
deer
0 0.05 0.1 0.15
german shepherd
poodle
rottweiler
chihuahua
golden retriever
boxer
(entity, )(entity
conceptconc| )
(ept
conc )ept
nP
n
17
![Page 18: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/18.jpg)
Basic Level Concepts
• Robin • Penguin
0 0.2 0.4 0.6
bird
species
character
songbird
common bird
small bird
0 0.1 0.2 0.3 0.4
animal
bird
species
flightless bird
seabird
diving bird
entit( ,concept)(concept | )
yentity
en( )tity
nP
n
18
![Page 19: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/19.jpg)
Obama’s real-estate policy
president, politician investment, property, asset, plan
president, politician, investment, property, asset, plan
Concepts of Multiple Entities
+ + +w1 w2 wn =
E. Gabrilovich and S. Markovitch. Wikipedia-based Semantic Interpretation for Natural Language Processing. J. of Art. Intell. Res. (JAIR). 2009.
Explicit Semantic Analysis (ESA)
19
![Page 20: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/20.jpg)
apple adobe
software company, brand, fruit brand, software company
software company, brand, fruitsoftware company, brand
Multiple Related Entities
Intersection instead of union!
20
![Page 21: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/21.jpg)
Probabilistic Conceptualization
P(concept | related entities)
TypicalityBasic Level Concept
P(adobe | fruit) = 0P(fruit | adobe, apple) = 0
Song et al., Int. Joint Conf. on Artif. Intell. (IJCAI). 2011.
1
( | ) ( )( | ) ( ) ( | )
( )
Mk k
k k i k
i
P E c P cP c E P c P e c
P E
( , )( | )
( )
i ki k
k
P e cP e c
P c{ | 1,..., }iE e i M
21
( )kP c
![Page 22: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/22.jpg)
Given “China, India, Russia, Brazil”
emerging market
emerging economy
economy
emerging country
country
emerging power
emerging nation
bric country
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
22
![Page 23: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/23.jpg)
Given “China, India, Japan, Singapore”
asian country
country
economy
asian nation
asian market
asian economy
asia pacific region
east asian country
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
23
![Page 24: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/24.jpg)
Outline of the TalkDataless Text Classification:
Classify Documents on the Fly
Documents
Labelnames Map
labels/documentsto the same space
World knowledge
Compute document and label similarities
Choose labels
24
![Page 25: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/25.jpg)
Generic Short Text Conceptualization
P(concept | short text)
1. Grounding to knowledge base
2. Clustering entities
3. Inside clusters: intersection
4. Between clusters: union
25
![Page 26: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/26.jpg)
Markov Random Field Model
Entity type:instance or attribute
Entity clique:intersection
Parameter estimation: concept distribution
26
![Page 27: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/27.jpg)
developed country
economy
country
fruit
fruit crop
fruit juice
news channel
news publication
news website0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
Given “U.S.”, “Japan”, “U.K.”;“apple”, “pear”; “BBC”, ”New York Time”
27
![Page 28: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/28.jpg)
Tweet Clustering
0 0.2 0.4 0.6 0.8 1
Topic Model
WordNet
Freebase
WikiCategory
Wiki (ESA)
Probase
Clustering Normalized Mutual Information (NMI)
companies, animals, countries 4 region-related countries
Better access the right level of concepts
Song et al., Int. Joint Conf. on Artif. Intell. (IJCAI). 2011. 28
![Page 29: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/29.jpg)
Web Search Relevance• Evaluation data:
– 300K Web queries
– 19M query-URL pairs
• Historical data:– 8M URLs
– 8B query-URL clicks
33
34
35
36
37
NDCG@1 NDCG@2 NDCG@3 NDCG@4 NDCG@5Content Ranker Probase
Song et al., Int. Conf. on Inf. and Knowl. Man. (CIKM). 2014. 29
![Page 30: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/30.jpg)
• World knowledge bases
– General purpose
– Information bias
• Domain dependent tasks
– E.g., classification/clustering of entertainment vs. sports
– Knowledge about science/technology is useless
Domain Adaptation
30
![Page 31: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/31.jpg)
Domain Adaptation for Corpus
Entity type:instance or attribute
Entity clique:intersection
Parameter estimation: concept distribution
Hyper-parameter estimation: domain adaptation
Complexity: 2( )O NM D
31
![Page 32: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/32.jpg)
Domain Adaptation Results
0.5
0.6
0.7
0.8
0.9
Tweets News titles
Clu
ster
ing
NM
I
Conceptualization Domain Adaptation
Song et al., Int. Joint Conf. on Artif. Intell. (IJCAI). 2015. 32
![Page 33: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/33.jpg)
Similarity and Relatedness
• Similarity– a specific type of relatedness– synonyms, hyponyms/hypernyms, and siblings are highly
similar• doctor vs. surgeon, bike vs. bicycle
• Relatedness– topically related or based on any other semantic relation
• heart vs. surgeon, tire vs. car
• In the following, we focus on Wikipedia!– The methodologies apply
• Entity relatedness• Domain adaptation
33
![Page 34: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/34.jpg)
Dataless Text Classification: Classify Documents on the Fly
Documents
Labelnames Map
labels/documentsto the same space
World knowledge
Compute document and label similarities
Choose labels
34
![Page 35: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/35.jpg)
Mobile Game or Sports?
Classification in the Same Semantic Space
arg min ( ( ), ( ))il l il Dist x l
35
+ + +w1 w2 wn =
E. Gabrilovich and S. Markovitch. Wikipedia-based Semantic Interpretation for Natural Language Processing. J. of Art. Intell. Res. (JAIR). 2009.
Explicit Semantic Analysis (ESA)
![Page 36: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/36.jpg)
0.60.52
0.68
Classification F1
F1
OHLDA Topics (#topic=20, #doc/topic=100)Word2Vec (window=5, dim=500)ESA with Wiki (#concept=500)
Classification of 20 Newsgroups Documents: Cosine Similarity
V.Ha-Thuc, and J.-M. Renders, Large-scale hierarchical text classification without labelled data. In WSDM 2011.Blei et al., Latent Dirichlet Allocation. J. of Mach. Learn. Res. (JMLR). 2003.Mikolov et al. Efficient Estimation of Word Representations in Vector Space. NIPS. 2013. 36
• 20 newsgroups
• L1: 6 classes
• L2: 20 classes
• OHLDA:
• Same hierarchy
• Word2vec
• Trained on wiki
• Skipgram
![Page 37: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/37.jpg)
Two Factors in Dataless Classification
• Length of document • Number of labels
1326
52104
209
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 100 200
F1
# words/document
Random Guess
Balanced binary classification Multi-class classification
37
2
20
103 6110.3
0.4
0.5
0.6
0.7
0.8
0.9
0 500
F1
# labels
2-newsgroups
20-newsgroups
103 RCV1 Topics
611 Wiki Cates
![Page 38: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/38.jpg)
Similarity• Cosine
ChampaignPoliceMakeArrestArmed
RobberyCasesTwo
ArrestedUI
Campus…
1111111
11
1111
Text 1 Text 2
38
![Page 39: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/39.jpg)
Representation Densification
Vector x Vector y
1.0
0.7
Cosine
Average
Max matching
Hungarian matching
39
![Page 40: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/40.jpg)
rec.autos vs. sci.electronics(1/16 document: 13 words per text)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
50 100 200 500 1000
Acc
ura
cy
# concepts in ESA (Wiki)
Concept (Cosine) Concept (Average)
Concept (Hungarian) Word2vec (200)
Song and Roth. North Amer. Chap. Assoc. Comp. Ling. (NAACL). 2015. 40
![Page 41: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/41.jpg)
Dataless Text Classification: Classify Documents on the Fly
Documents
Labelnames Map
labels/documentsto the same space
World knowledge
Compute document and label similarities
Choose labels
41
![Page 42: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/42.jpg)
0.60.52
0.68
Classification F1
F1
Topic (#topic=20, #doc/topic=100)Word2Vec (window=5, dim=500)ESA (#concept=500)
0.52
0.64
0.770.83
0.87
Classification F1
100 200 500 1,000 2,000
Blei et al., Latent Dirichlet Allocation. J. of Mach. Learn. Res. (JMLR). 2003.Mikolov et al. Efficient Estimation of Word Representations in Vector Space. Adv. Neur. Info. Proc. Sys. (NIPS). 2013.
Supervised Classification
Cosine Similarity
Classification of 20 Newsgroups Documents
42
![Page 43: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/43.jpg)
Bootstrapping with Unlabeled Data
• Initialize N documents for each label
– Pure similarity based classifications
• Train a classifier to label N more documents
– Continue to label more data until no unlabeled document exists
Application of world knowledge of label meaning
Domain adaptation
Mobile gamesSports
43
![Page 44: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/44.jpg)
0.52
0.64
0.770.83
0.87
Classification F1100 200 500 1,000 2,000
Bootstrapped: 0.84
Pure Similarity: 0.68
Song and Roth. Assoc. Adv. Artif. Intell. (AAAI). 2014
Classification of 20 Newsgroups Documents
44
![Page 45: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/45.jpg)
2
Hierarchical Classification: Considering Label Dependency
Root
A B … N
1 2 M 1 …… 1 2 …
• Top-down classification
• Bottom-up classification (flat classification)
45
![Page 46: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/46.jpg)
Top-down vs. Bottom-up
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
20-newsgroups RCV1
Mic
roF1
Top-down Botton-up
Song and Roth. Assoc. Adv. Artif. Intell. (AAAI). 2014
RCV1• 804,414 documents• 82 categories in 4 levels• 103 nodes in hierarchy• 3.24 labels/document
46
![Page 47: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/47.jpg)
Dataless Text Classification: Classify Documents on the Fly
Documents
Labelnames Map
labels/documentsto the same space
World knowledge
Compute document and label similarities
Choose labels
47
![Page 48: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/48.jpg)
Labeled data in training
Unlabeled data in training
Label names in training
I.I.D. between training and testing
Supervised learning
Yes No No Yes
Unsupervisedlearning
No Yes No Yes
Semi-supervisedlearning
Yes Yes No Yes
Transfer learning Yes Yes No No
Zero-shot learning Yes No Yes No
DatalessClassification (pure similarity)
No No Yes No
DatalessClassification (bootstrapping)
No Yes Yes Yes
48
![Page 49: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/49.jpg)
Conclusions• Dataless classification
– Reduce labeling work for thousands of documents
• Compared semantic representation using world knowledge– Probabilistic conceptualization (PC)– Explicit semantic analysis (ESA)– Word embedding (word2vec)– Topic model (LDA)– Combination of ESA and word2vec
• Unified PC and ESA– Markov random field model
• Domain adaptation– Hyper-parameter estimation– Boostrapping – refining the classifier
Thank You!
Advertisement:Using knowledge as
structured information instead of flat features!
Session 7B, DM835
49
![Page 50: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/50.jpg)
![Page 51: Text Classification without Supervision: Incorporating World …home.cse.ust.hk/~yqsong/papers/2015-ICDM-workshop-super... · 2016. 7. 14. · North Amer. Chap. Assoc. Comp. Ling](https://reader036.vdocuments.us/reader036/viewer/2022071608/614625068f9ff8125420133c/html5/thumbnails/51.jpg)
Correlation with Human Annotation of IS-A Relationships
0.057
0.233
0.350.422
0.619
Spea
rman
’s C
orr
elat
ion
Random Guess SemEval'12 Best NN-Vector Lexical Pattern Probase
Combining Heterogeneous Models for Measuring Relational Similarity. A. Zhila, W. Yih, C. Meek, G. Zweig & T. Mikolov. In NAACL-HLT-13.
Gigaword corpus
The Web
51