classifying entities into an incomplete ontology bhavana dalvi, william w. cohen, jamie callan...
TRANSCRIPT
![Page 1: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/1.jpg)
CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY
Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University
![Page 2: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/2.jpg)
Motivation
Existing Techniques Semi-supervised Hierarchical Classification: Carlson
WSDM’10 Extending knowledge bases: Finding new relations or
attributes of existing concepts Mohamed et al. EMNLP’11 Unsupervised ontology discovery:
Adams et al. NIPS’10, Blei et al. JACM’10, Reisinger et al. ACL’09
Evolving Web-scale datasets Billions of entities and hundreds of thousands of
concepts Difficult to create a complete ontology Hierarchical classification of entities into incomplete
ontologies is needed
![Page 3: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/3.jpg)
Contributions
Hierarchical Exploratory EM Adds new instances to the existing classes Discovers new classes and adds them at appropriate
places in the ontology
Class constraints: Inclusion: Every entity that is “Mammal” is also an
“Animal” Mutual Exclusion: If an entity is “Electronic Device”
then its not “Mammal”
![Page 4: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/4.jpg)
Problem Definition
Input Large set of data-points : Some known classes : Class constraints betweenclasses Small number of seeds per known class: n
Output Labels for all data-points Discover new classes from data: k Updated class constraints:
![Page 5: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/5.jpg)
Review: Exploratory EM [Dalvi et al. ECML 2013]
Initialize model with few seeds per classIterate till convergence (Data likelihood and #
classes) E step: Predict labels for unlabeled points If P(Cj | Xi) is nearly-uniform for a data-point Xi, j=1 to k
Create a new class Ck+1, assign Xi to it
M step: Recompute model parameters using seeds + predicted labels for unlabeled points
Number of classes might increase in each iteration
Check if model selection criterion is satisfied If not, revert to model in Iteration `t-1’
Classification/clustering
KMeans, NBayes, VMF …
Max/Min ratioJS Divergence
AIC, BIC, AICc …
![Page 6: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/6.jpg)
Hierarchical Exploratory EM
Initialize model with few seeds per classIterate till convergence (Data likelihood and # classes)
E step: Predict labels for unlabeled points Assign a consistent bit vector of labels for each
unlabeled datapoint If is nearly-uniform for a data-point
Create a new class , assign to it Update class constraints accordingly
M step: Recompute model parameters using seeds + predicted labels for unlabeled points
Number of classes might increase in each iteration Since the E step follows class constraints this step need
not be modified
Check if model selection criterion is satisfied If not, revert to model in Iteration `t-1’
![Page 7: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/7.jpg)
Divide-And-Conquer Exploratory EM
Mutual ExcIusion
Root
FoodLocatio
n
CountryState Vegetable
Condiment
Inclusion
E.g. Spinach, Potato, Pepper…
Level 1
Level 2
Level 3
Assumptions: Classes are arranged in a tree-structured hierarchy. Classes at any level of the hierarchy are mutually exclusive.
![Page 8: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/8.jpg)
Divide-And-Conquer Exploratory EM
Root
FoodLocati
on
Country
State
Vegetable
Condiment
1.0 California
![Page 9: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/9.jpg)
Divide-And-Conquer Exploratory EM
Root
FoodLocati
on
Country
State
Vegetable
Condiment
1.0 California
0.9 0.1
![Page 10: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/10.jpg)
Divide-And-Conquer Exploratory EM
Root
FoodLocati
on
Country
State
Vegetable
Condiment
1.0 California
0.8 0.2
0.9 0.1
0 1 0 01 1 0
![Page 11: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/11.jpg)
Divide-And-Conquer Exploratory EM
Root
FoodLocati
on
Country
State
Vegetable
Condiment
1.0 Coke
![Page 12: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/12.jpg)
Divide-And-Conquer Exploratory EM
Root
FoodLocati
on
Country
State
Vegetable
Condiment
1.0 Coke
0.1 0.9
![Page 13: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/13.jpg)
Divide-And-Conquer Exploratory EM
Root
FoodLocati
on
Country
State
Vegetable
Condiment
1.0 Coke
0.1 0.9
0.55 0.45
![Page 14: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/14.jpg)
Divide-And-Conquer Exploratory EM
Root
FoodLocati
on
Country
State
Vegetable
Condiment
1.0 Coke
0.1 0.9
0.55 0.45
C8
Coke
1 0 0 01 0 0 1
![Page 15: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/15.jpg)
Divide-And-Conquer Exploratory EM
Root
FoodLocati
on
Country
State
Vegetable
Condiment
1.0 Coke
0.1 0.9
0.55 0.45
𝑪𝟖
Adds to class constraints
1 0 0 01 0 0 1
Coke
![Page 16: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/16.jpg)
Divide-And-Conquer Exploratory EM
Root
FoodLocati
on
Country
State
Vegetable
Condiment
1.0 Cat
C8
C90.45 0.55Cat
0 0 0 00 0 0 11
Adds to class constraints
![Page 17: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/17.jpg)
What are we trying to optimize? Objective Function :
Maximize { Log Data Likelihood – Model Penalty } m: #clusters,
Params{C1… Cm}
subject to Class constraints: Zm
![Page 18: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/18.jpg)
Datasets
Ontology 1
Ontology 2
Dataset
#Classes
#Levels
#NELLentities
#Contexts
DS-1 11 3 2.5K 3.4M
DS-2 39 4 12.9K 6.7M
Clueweb09 Corpus
+Subsets of
NELL
![Page 19: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/19.jpg)
Results
Dataset
#Train/Test Points
DS-1 335/ 2.2K
DS-2 1.5K/11.4K
![Page 20: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/20.jpg)
Results
Dataset
#Train/Test Points
Level
#Seed/ #Ideal Classes
DS-1 335/ 2.2K
2 2/3
3 4/7
DS-2 1.5K/11.4K
2 3.9/4
3 9.4/24
4 2.4/10
![Page 21: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/21.jpg)
Results
Dataset
#Train/Test Points
Level
#Seed/ #Ideal Classes
Macro-averaged Seed Class F1
FLAT
SemisupEM
ExploratoryEM
DS-1 335/ 2.2K
2 2/3 43.2 78.7 *
3 4/7 34.4 42.6 *
DS-2 1.5K/11.4K
2 3.9/4 64.3 53.40
3 9.4/24 31.3 33.7 *
4 2.4/10 27.5 38.9 *
![Page 22: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/22.jpg)
Results
Dataset
#Train/Test Points
Level
#Seed/ #Ideal Classes
Macro-averaged Seed Class F1
FLAT DAC
SemisupEM
ExploratoryEM
SemisupEM
ExploratoryEM
DS-1 335/ 2.2K
2 2/3 43.2 78.7 * 69.5 77.2 *
3 4/7 34.4 42.6 * 31.3 44.4 *
DS-2 1.5K/11.4K
2 3.9/4 64.3 53.40
65.4 68.9 *
3 9.4/24 31.3 33.7 * 34.9 41.7 *
4 2.4/10 27.5 38.9 * 43.2 42.40
![Page 23: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/23.jpg)
Conclusions
Hierarchical Exploratory EM works with incomplete class hierarchy and few seed instances to extend the existing knowledge base.
Encouraging preliminary results Hierarchical classification Flat classification Exploratory Learning Semi-supervised Learning
Future work: Incorporate arbitrary class constraints Evaluate the newly added clusters
![Page 24: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/24.jpg)
Thank You
Questions?
![Page 25: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/25.jpg)
Extra Slides
![Page 26: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/26.jpg)
Class Creation Criterion
Given MinMax ratio:
Jensen-Shannon divergence: JS-Div(
![Page 27: CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University](https://reader037.vdocuments.us/reader037/viewer/2022110206/56649cef5503460f949bdbb7/html5/thumbnails/27.jpg)
Model Selection
Extended Akaike Information Criterion
AICc(g) = -2*L(g) + 2*v + 2*v*(v+1)/(n – v -1) Here g: model being evaluated, L(g): log-likelihood of data given g, v: number of free parameters of the model, n: number of data-points.