multi-view exploratory learning for akbc problems bhavana dalvi and william w. cohen school of...

1
Multi-view Exploratory Learning for AKBC Problems Bhavana Dalvi and William W. Cohen School Of Computer Science, Carnegie Mellon University Motivation Modeling Unobserved Classes Multi-view Exploratory EM AKBC tasks Acknowledgements : This work is supported by Google PhD Fellowship in Information Extraction and a Google Research Grant. Conclusions Traditional EM method for SSL jointly learns missing labels of unlabeled data points as well as model parameters. We consider two extensions of traditional EM for SSL: We consider a new latent variable, unobserved classes, by dynamically introducing new classes when appropriate. Assigning multiple labels from multiple levels of class hierarchy while satisfying ontological constraints, and considering multiple data views. Our proposed framework combines structural search for the best class hierarchy with SSL, reducing the semantic draft associated with erroneously grouping unanticipated classes with expected classes. Exploratory learning helps reduce semantic drift of seeded classes. It gets more powerful in conjunction with multiple data views and class hierarchy, when imposed as soft- constraints on the label vectors. It can be applied for multiple AKBC tasks like macro-reading, gloss finding, ontology extension etc. Datasets and code can be downloaded from: www.cs.cmu.edu/~bbd/exploratory_learn Inputs: N: #data points; . Outputs: {}: Parameters for k seed and m newly added classes; Class can have data views; : Set of class constraints between k+m classes; : Labels for Initialize the modelwith a few seeds per class Iterate till convergence (Data likelihood and number of classes) E Step (Iteration t): Predict labels for unlabeled data points For i = 1 : N = CombineMultiViewScore ( ) If NewClassCreationCriterion ( Create a new class , assign to it = UpdateConstraints () OptimalLabelAssignment M step: Re-compute model parameters using seeds and predicted labels for unlabeled data points . Number of classes might increase in each iteration. Check if model selection criterion is satisfied. If not, revert to model in Iteration `t- 1’ Dynamically introducing new classes Hypothesis: Dynamically inducing clusters of data-points that do not belong to any of the seeded classes will reduce the semantic drift. For each data-point , we compute posterior distribution of belonging to any of the existing classes [Dalvi et al., ECML’13] Criterion 1 : MinMax IfCreate a new class/cluster Criterion 2 : JS (Jensen–Shannon divergence) = uniform distribution over k classes if Create a new class/cluster For hierarchical classification we also need to decide where to place this newly created class: Divide and conquer method for extending tree structured ontology [Dalvi et al. AKBC 2013] Extension of DAC to extend a generic ontology with subset and mutual exclusion constraints (OptDAC) [Dalvi and Cohen, under review] Model Selection This step makes sure that we do not create too many new classes. We tried BIC, AIC, and AICc criteria, and Extended AIC (AICc) worked best for our tasks. AICc(g) = AIC(g) + 2 * v * (v+1) / (n – v -1) Here g: Model being evaluated, L(g): Log-likelihood of data given g, v: Number of free parameters of the model, n: Number of data points. Each data point and class centroid or classifier has representation in multiple views and E.g. In the noun phrase classification task, we consider co-occurrences of NPs in text sentences (View-1) and HTML tables (View-2). Combining scores from multiple views Sum-Score: Addition of scores Prod-Score: Product of scores Max-Agree: Maximize agreement between per view label assignments [Dalvi and Cohen, in submission] Multiple Data Views Incorporating Multiple Views and Ontological Constraints Each data point is assigned a bit vector of labels. Subset and mutual exclusion constraints decide consistency of potential bit vectors. GLOFIN: A mixed integer program is solved for each data point to get optimal label vector. [Dalvi et al. WSDM 2015] Optimized Divide and Conquer (OptDAC): Here we combine 1) divide and conquer based top-down strategy to detect and place new categories in the ontology, with 2) mixed integer programming technique (GLOFIN) to select optimal set of labels for a data point, consistent w.r.t. ontological constraints. Semi-supervised classification of noun- phrases into categories, using distributional features. Exploratory learning can reduce semantic drift of seed classes. [Dalvi et al. ECML 2013] Macro- reading (Explore- EM) Micro- reading Task: To classify an entity mention using context specific features . Clustering NIL entities for KBP entity discovery and linking (EDL) task [Mazaitis et al., KBP 2014] Multi-view Hierarchical SSL (MaxAgree) MaxAgree method exploits clues from different data views. We define multi-view clustering as an optimization problem and compare various methods for combining scores across views. MaxAgree method is more robust compared to Prod-Score method when we vary difference of performance between views. Our proposed Hier-MaxAgree method can incorporate both: the clues from multiple view, and ontological constraints. [Dalvi and Cohen, in submission] On entity classification for NELL KB, our proposed Hier-MaxAgree method gave state-of-the-art performance. Different Document Representations Naïve Bayes: Assumes multinomial distribution for feature occurrences, explicitly models class prior. Seeded K-Means: Similarity based on cosine distance between centroids and data points Seeded von Mises-Fisher: SSL method for data distributed on the unit hyper-sphere. Ontological Constraints Automatic gloss finding for KBs (GLOFIN) We developed GLOFIN method that takes a gloss-free KB, a large collection of glosses and automatically matches glosses to entities in the KB. [Dalvi et al. WSDM 2015] We used Glosses with only one candidate KB entity (unambiguous glosses) are used as training data to train hierarchical classification model for categories in the KB. Ambiguous glosses are then disambiguated based on the KB category they are put in. Our method outperformed SVM and a label propagation baseline especially when amount of training data is small. In future: Apply GLOFIN to word sense disambiguation w.r.t. WordNet synset hierarchy. Hierarchical Exploratory Learning (OptDAC) We proposed OptDAC that can do hierarchical SSL in the presence of incomplete class ontologies. It employs mixed integer programming formulation to find optimal label assignments for a data point, while traversing the class ontology in top-down fashion to detect whether a new class needs to be added and where to place it. [Dalvi and Cohen, under review] Precision Recall F1 0 20 40 60 80 SVM Labal Propagation GLOFIN-Naïve-Bayes 1 2 3 4 5 6 5 10 15 20 25 30 40 45 50 55 60 65 70 Concatenati Co-training Sum-Score Prod-Score Hier- MaxAgree Training Percentage Macro-averaged F1 score Performance improvement over best view Correlation w.r.t difference in views Coeffic ient P- value Prod-Score -0.59 0.01 MaxAgree -0.05 0.82 Text-patterns + Ontology-1 Text- patterns + Ontology-2 HTML- tables + Ontology- 1 HTML- tables + Ontology- 2 An example of extended ontology by OptDAC Root Food Locati on Count ry Stat e Vegetab le Condime nt 1. 0 Coke 0. 1 0. 9 0.55 0.4 5 C8 Example use- case of Exploratory EM 20 Newsgroups Dataset (#seed classes = 6)

Upload: abigayle-fisher

Post on 22-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-view Exploratory Learning for AKBC Problems Bhavana Dalvi and William W. Cohen School Of Computer Science, Carnegie Mellon University Motivation

Multi-view Exploratory Learningfor AKBC Problems

Bhavana Dalvi and William W. CohenSchool Of Computer Science, Carnegie Mellon University

Motivation

Modeling Unobserved

Classes

Multi-view Exploratory EM

AKBC tasks

Acknowledgements : This work is supported by Google PhD Fellowship in Information Extraction and a Google Research Grant.

Conclusions

Traditional EM method for SSL jointly learns missing labels of unlabeled data points as well as model parameters.

We consider two extensions of traditional EM for SSL: We consider a new latent variable,

unobserved classes, by dynamically introducing new classes when appropriate.

Assigning multiple labels from multiple levels of class hierarchy while satisfying ontological constraints, and considering multiple data views.

Our proposed framework combines structuralsearch for the best class hierarchy with SSL, reducing the semantic draft associated with erroneouslygrouping unanticipated classes with expected classes.

Exploratory learning helps reduce semantic drift of seeded classes. It gets more powerful in conjunction with multiple data views and class hierarchy, when imposed as soft-constraints on the label vectors.

It can be applied for multiple AKBC tasks like macro-reading, gloss finding, ontology extension etc.

Datasets and code can be downloaded from: www.cs.cmu.edu/~bbd/exploratory_learning

Inputs: N: #data points; .

Outputs: {}: Parameters for k seed and m newly added classes; Class can have data views; : Set of class constraints between k+m classes; : Labels for

Initialize the modelwith a few seeds per class Iterate till convergence (Data likelihood and

number of classes) E Step (Iteration t): Predict labels for unlabeled

data points

For i = 1 : N = CombineMultiViewScore()

If NewClassCreationCriterion(

Create a new class , assign to it = UpdateConstraints()

OptimalLabelAssignment

M step: Re-compute model parameters using seeds and predicted labels for unlabeled data points .

Number of classes might increase in each iteration.

Check if model selection criterion is satisfied.

If not, revert to model in Iteration `t-1’

Dynamically introducing new classes Hypothesis: Dynamically inducing clusters

of data-points that do not belong to any of the seeded classes will reduce the semantic drift.

For each data-point , we compute posterior distribution of belonging to any of the existing classes [Dalvi et al., ECML’13]

Criterion 1 : MinMax

IfCreate a new class/cluster Criterion 2 : JS (Jensen–Shannon

divergence) = uniform distribution over k classes if Create a new class/cluster For hierarchical classification we also need to

decide where to place this newly created class: Divide and conquer method for extending

tree structured ontology [Dalvi et al. AKBC 2013]

Extension of DAC to extend a generic ontology with subset and mutual exclusion constraints (OptDAC) [Dalvi and Cohen, under review]

Model Selection This step makes sure that we do not create

too many new classes. We tried BIC, AIC, and AICc criteria, and

Extended AIC (AICc) worked best for our tasks.

AICc(g) = AIC(g) + 2 * v * (v+1) / (n – v -1) Here g: Model being evaluated, L(g): Log-likelihood of data given g, v: Number of free parameters of the model, n: Number of data points.

Each data point and class centroid or classifier has representation in multiple views and

E.g. In the noun phrase classification task, we consider co-occurrences of NPs in text sentences (View-1) and HTML tables (View-2).

Combining scores from multiple views Sum-Score: Addition of scores Prod-Score: Product of scores Max-Agree: Maximize agreement between per view

label assignments [Dalvi and Cohen, in submission]

Multiple Data Views

Incorporating Multiple Views

and Ontological Constraints

Each data point is assigned a bit vector of labels. Subset and mutual exclusion constraints decide consistency of potential bit vectors.

GLOFIN: A mixed integer program is solved for each data point to get optimal label vector. [Dalvi et al. WSDM 2015]

Optimized Divide and Conquer (OptDAC): Here we combine 1) divide and conquer based top-down strategy to detect and place new categories in the ontology, with 2) mixed integer programming technique (GLOFIN) to select optimal set of labels for a data point, consistent w.r.t. ontological constraints.

Semi-supervised classification of noun-phrasesinto categories, using distributional features.

Exploratory learning can reduce semantic drift of seed classes.[Dalvi et al. ECML 2013]

Macro-reading

(Explore-EM)

Micro-reading Task: To classify an entity mention using context

specific features . Clustering NIL entities for KBP entity discovery and

linking (EDL) task [Mazaitis et al., KBP 2014]Multi-view Hierarchical SSL (MaxAgree)

MaxAgree method exploits clues from different data views.

We define multi-view clustering as an optimization problem and compare various methods for combining scores across views.MaxAgree method is more robust compared to Prod-Score method when we varydifference of performance between views.

Our proposed Hier-MaxAgree method can incorporate both: the clues from multiple view, and ontological constraints.[Dalvi and Cohen, in submission]

On entity classification for NELL KB, our proposed Hier-MaxAgree method gave state-of-the-art performance.

Different Document Representations

Naïve Bayes: Assumes multinomial distribution for feature occurrences, explicitly models class prior.

Seeded K-Means: Similarity based on cosine distance between centroids and data points

Seeded von Mises-Fisher: SSL method for data distributed on the unit hyper-sphere.

Ontological Constraints

Automatic gloss finding for KBs (GLOFIN)

We developed GLOFIN method that takes a gloss-free KB, a large collection of glosses and automatically matches glosses to entities in the KB. [Dalvi et al. WSDM 2015]

We used Glosses with only one candidate KB entity (unambiguous glosses) are used as training data to train hierarchical classification model for categories in the KB. Ambiguous glosses are then disambiguated based on the KB category they are put in.

Our method outperformed SVM and a label propagation baseline especially when amount of training data is small.

In future: Apply GLOFIN to word sense disambiguation w.r.t. WordNet synset hierarchy.

Hierarchical Exploratory Learning (OptDAC) We proposed OptDAC that can do hierarchical SSL in the

presence of incomplete class ontologies. It employs mixed integer programming formulation to find

optimal label assignments for a data point, while traversing the class ontology in top-down fashion to detect whether a new class needs to be added and where to place it. [Dalvi and Cohen, under review]

Precision Recall F10

1020304050607080

SVMLabal PropagationGLOFIN-Naïve-Bayes

1 2

3 4

5

6

5 10 15 20 25 3040

45

50

55

60

65

70

Concatena-tion

Co-training

Sum-Score

Prod-Score

Hier-Max-Agree

Training Percentage

Macro

-avera

ged

F1

score

Performance improvement over best view

Correlation w.r.t difference in views

Coefficient

P-value

Prod-Score -0.59 0.01

MaxAgree -0.05 0.82

Text-patterns + Ontology-1

Text-patterns + Ontology-2

HTML-tables +

Ontology-1HTML-

tables + Ontology-2

An example of extended ontology by OptDAC

Root

FoodLocation

Country

State

Vegetable

Condiment

1.0

Coke

0.10.9

0.55 0.45

C8

Example use-case of

Exploratory EM

20 Newsgroups Dataset (#seed classes = 6)