combining distant and partial supervision for relation...

78
Combining Distant and Partial Supervision for Relation Extraction Gabor Angeli, Julie Tibshirani, Jean Y. Wu, Christopher D. Manning Stanford University October 28, 2014 Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 1 / 19

Upload: others

Post on 26-Sep-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Combining Distant and Partial Supervision for RelationExtraction

Gabor Angeli, Julie Tibshirani, Jean Y. Wu, Christopher D. Manning

Stanford University

October 28, 2014

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 1 / 19

Page 2: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Motivation: Knowledge Base Completion

Unstructured Text

Structured Knowledge Base

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 2 / 19

Page 3: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Motivation: Question Answering

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 3 / 19

Page 4: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Motivation: Question Answering

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 3 / 19

Page 5: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Motivation: Question Answering

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 3 / 19

Page 6: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Motivation: Question Answering

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 3 / 19

Page 7: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Relation Extraction

Input: Sentences containing (entity, slot value).Output: Relation between entity and slot value.

Consider two approaches:

Supervised: Trivial as a supervised classifier.Training data: {(sentence, relation)}.But. . . this training data is expensive to produce.

Distantly Supervised: Artificially produce “supervised” data.Training data: {(entity, relation, slot value)}.But. . . this training data is much more noisy.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 4 / 19

Page 8: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Relation Extraction

Input: Sentences containing (entity, slot value).Output: Relation between entity and slot value.

Consider two approaches:

Supervised: Trivial as a supervised classifier.Training data: {(sentence, relation)}.But. . .

this training data is expensive to produce.

Distantly Supervised: Artificially produce “supervised” data.Training data: {(entity, relation, slot value)}.But. . . this training data is much more noisy.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 4 / 19

Page 9: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Relation Extraction

Input: Sentences containing (entity, slot value).Output: Relation between entity and slot value.

Consider two approaches:

Supervised: Trivial as a supervised classifier.Training data: {(sentence, relation)}.But. . . this training data is expensive to produce.

Distantly Supervised: Artificially produce “supervised” data.Training data: {(entity, relation, slot value)}.But. . . this training data is much more noisy.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 4 / 19

Page 10: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Relation Extraction

Input: Sentences containing (entity, slot value).Output: Relation between entity and slot value.

Consider two approaches:

Supervised: Trivial as a supervised classifier.Training data: {(sentence, relation)}.But. . . this training data is expensive to produce.

Distantly Supervised: Artificially produce “supervised” data.Training data: {(entity, relation, slot value)}.But. . .

this training data is much more noisy.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 4 / 19

Page 11: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Relation Extraction

Input: Sentences containing (entity, slot value).Output: Relation between entity and slot value.

Consider two approaches:

Supervised: Trivial as a supervised classifier.Training data: {(sentence, relation)}.But. . . this training data is expensive to produce.

Distantly Supervised: Artificially produce “supervised” data.Training data: {(entity, relation, slot value)}.But. . . this training data is much more noisy.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 4 / 19

Page 12: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Contribution: Combine Benefits of Both

Adding carefully selected supervision improves distantly supervisedrelation extraction.

What is “carefully selected”: Propose new active learning criterion.

Evaluate a number of questions:Is the proposed criterion better than other methods?

Where is the supervision helping?

How far can we get with a supervised classifier?

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 5 / 19

Page 13: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Contribution: Combine Benefits of Both

Adding carefully selected supervision improves distantly supervisedrelation extraction.

What is “carefully selected”: Propose new active learning criterion.

Evaluate a number of questions:

Is the proposed criterion better than other methods?

Where is the supervision helping?

How far can we get with a supervised classifier?

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 5 / 19

Page 14: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Contribution: Combine Benefits of Both

Adding carefully selected supervision improves distantly supervisedrelation extraction.

What is “carefully selected”: Propose new active learning criterion.

Evaluate a number of questions:Is the proposed criterion better than other methods?

Where is the supervision helping?

How far can we get with a supervised classifier?

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 5 / 19

Page 15: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Contribution: Combine Benefits of Both

Adding carefully selected supervision improves distantly supervisedrelation extraction.

What is “carefully selected”: Propose new active learning criterion.

Evaluate a number of questions:Is the proposed criterion better than other methods?

Where is the supervision helping?

How far can we get with a supervised classifier?

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 5 / 19

Page 16: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Contribution: Combine Benefits of Both

Adding carefully selected supervision improves distantly supervisedrelation extraction.

What is “carefully selected”: Propose new active learning criterion.

Evaluate a number of questions:Is the proposed criterion better than other methods?

Where is the supervision helping?

How far can we get with a supervised classifier?

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 5 / 19

Page 17: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Distant Supervision

(Barack Obama, EmployedBy, United States)

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 6 / 19

Page 18: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Multiple-Instance Multiple-Label (MIML) Learning

(Barack Obama, EmployedBy, United States)

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 6 / 19

Page 19: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Distant Supervision

y

x

y

x

y

x

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 7 / 19

↑ Barack Obama is the 44th andcurrent president of the United States

↓ EmployedBy

Page 20: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Distant Supervision

y

x

y

x

y

x

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 7 / 19

↑ Barack Obama is the 44th andcurrent president of the United States

↓ EmployedBy

Page 21: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Multiple-Instance

y

z2

x2

z1

x1

z3

x3

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 8 / 19

Latentper-mention relation→

Page 22: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Multiple-Instance

y

z2

x2

z1

x1

z3

x3

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 8 / 19

Latentper-mention relation→

Page 23: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Multiple-Instance Multiple-Label (MIML-RE)

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 8 / 19

Page 24: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Active Learning

Old problem: Supervision is expensive, but very useful.

Old solution: Active learning!

Select a subset of latent z to annotate.

Fix these labels during training.Bonus: this creates a supervised training set.

We initialize from a supervised classifier on this training set.

Some Statistics

1,208,524 latent z which we could annotate.

$0.13 per annotation.

$160,000 to annotate everything.

New spin: Have to get it right the first time.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 9 / 19

Page 25: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Active Learning

Old problem: Supervision is expensive, but very useful.

Old solution: Active learning!

Select a subset of latent z to annotate.

Fix these labels during training.

Bonus: this creates a supervised training set.We initialize from a supervised classifier on this training set.

Some Statistics

1,208,524 latent z which we could annotate.

$0.13 per annotation.

$160,000 to annotate everything.

New spin: Have to get it right the first time.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 9 / 19

Page 26: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Active Learning

Old problem: Supervision is expensive, but very useful.

Old solution: Active learning!

Select a subset of latent z to annotate.

Fix these labels during training.Bonus: this creates a supervised training set.

We initialize from a supervised classifier on this training set.

Some Statistics

1,208,524 latent z which we could annotate.

$0.13 per annotation.

$160,000 to annotate everything.

New spin: Have to get it right the first time.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 9 / 19

Page 27: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Active Learning

Old problem: Supervision is expensive, but very useful.

Old solution: Active learning!

Select a subset of latent z to annotate.

Fix these labels during training.Bonus: this creates a supervised training set.

We initialize from a supervised classifier on this training set.

Some Statistics

1,208,524 latent z which we could annotate.

$0.13 per annotation.

$160,000 to annotate everything.

New spin: Have to get it right the first time.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 9 / 19

Page 28: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Active Learning

Old problem: Supervision is expensive, but very useful.

Old solution: Active learning!

Select a subset of latent z to annotate.

Fix these labels during training.Bonus: this creates a supervised training set.

We initialize from a supervised classifier on this training set.

Some Statistics

1,208,524 latent z which we could annotate.

$0.13 per annotation.

$160,000 to annotate everything.

New spin: Have to get it right the first time.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 9 / 19

Page 29: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Example Selection Criteria

1 Train k MIML-RE models on k subsets of the data.y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

2 For each latent z, each trained model c predicts a multinomial Pc(z).3 Calculate Jensen-Shannon divergence: 1

k ∑kc=1 KL(pc(z)||pmean(z)).

4 We have measure of disagreement for each z.

Three selection criteria

Sample uniformly (uniform).

Take z with highest disagreement (highJS).

Sample z with highest disagreement (sampleJS).

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 10 / 19

Page 30: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Example Selection Criteria

1 Train k MIML-RE models on k subsets of the data.y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

2 For each latent z, each trained model c predicts a multinomial Pc(z).

3 Calculate Jensen-Shannon divergence: 1k ∑

kc=1 KL(pc(z)||pmean(z)).

4 We have measure of disagreement for each z.

Three selection criteria

Sample uniformly (uniform).

Take z with highest disagreement (highJS).

Sample z with highest disagreement (sampleJS).

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 10 / 19

Page 31: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Example Selection Criteria

1 Train k MIML-RE models on k subsets of the data.y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

2 For each latent z, each trained model c predicts a multinomial Pc(z).3 Calculate Jensen-Shannon divergence: 1

k ∑kc=1 KL(pc(z)||pmean(z)).

4 We have measure of disagreement for each z.

Three selection criteria

Sample uniformly (uniform).

Take z with highest disagreement (highJS).

Sample z with highest disagreement (sampleJS).

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 10 / 19

Page 32: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Example Selection Criteria

1 Train k MIML-RE models on k subsets of the data.y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

2 For each latent z, each trained model c predicts a multinomial Pc(z).3 Calculate Jensen-Shannon divergence: 1

k ∑kc=1 KL(pc(z)||pmean(z)).

4 We have measure of disagreement for each z.

Three selection criteria

Sample uniformly (uniform).

Take z with highest disagreement (highJS).

Sample z with highest disagreement (sampleJS).

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 10 / 19

Page 33: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Example Selection Criteria

1 Train k MIML-RE models on k subsets of the data.y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

2 For each latent z, each trained model c predicts a multinomial Pc(z).3 Calculate Jensen-Shannon divergence: 1

k ∑kc=1 KL(pc(z)||pmean(z)).

4 We have measure of disagreement for each z.

Three selection criteria

Sample uniformly (uniform).

Take z with highest disagreement (highJS).

Sample z with highest disagreement (sampleJS).

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 10 / 19

Page 34: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Example Selection Criteria

1 Train k MIML-RE models on k subsets of the data.y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

2 For each latent z, each trained model c predicts a multinomial Pc(z).3 Calculate Jensen-Shannon divergence: 1

k ∑kc=1 KL(pc(z)||pmean(z)).

4 We have measure of disagreement for each z.

Three selection criteria

Sample uniformly (uniform).

Take z with highest disagreement (highJS).

Sample z with highest disagreement (sampleJS).

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 10 / 19

Page 35: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Example Selection Criteria

1 Train k MIML-RE models on k subsets of the data.y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

2 For each latent z, each trained model c predicts a multinomial Pc(z).3 Calculate Jensen-Shannon divergence: 1

k ∑kc=1 KL(pc(z)||pmean(z)).

4 We have measure of disagreement for each z.

Three selection criteria

Sample uniformly (uniform).

Take z with highest disagreement (highJS).

Sample z with highest disagreement (sampleJS).

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 10 / 19

Page 36: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Example Selection Criteria

0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1·104

JS Disagreement

#E

xam

ples

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 11 / 19

← Mostly easy examples

PotentiallyNon-representative examples ↓

Page 37: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Example Selection Criteria

0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1·104

JS Disagreement

#E

xam

ples

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 11 / 19

← Mostly easy examples

PotentiallyNon-representative examples ↓

Page 38: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Example Selection Criteria

0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1·104

JS Disagreement

#E

xam

ples

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 11 / 19

← Mostly easy examples

PotentiallyNon-representative examples ↓

Page 39: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Example Selection Criteria

Committee Member Judgments

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

Sentence Member A Member B Member CObama was born in Hawaii born born no relationObama grew up in Hawaii born lived in bornObama Bear visits Hawaii no relation born employee ofPresident Obama . . . title title titleObama employed president . . . employee of title employee of

Uniform: Often annotates easy sentences.High JS (disagreement): More likely to annotate “rare” sentences.Sample JS (disagreement): Mix of hard and representative sentences.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 12 / 19

Page 40: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Example Selection Criteria

Committee Member Judgments

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

Sentence Member A Member B Member CObama was born in Hawaii born born no relationObama grew up in Hawaii born lived in bornObama Bear visits Hawaii no relation born employee ofPresident Obama . . . title title titleObama employed president . . . employee of title employee of

Uniform: Often annotates easy sentences.

High JS (disagreement): More likely to annotate “rare” sentences.Sample JS (disagreement): Mix of hard and representative sentences.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 12 / 19

Page 41: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Example Selection Criteria

Committee Member Judgments

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

Sentence Member A Member B Member CObama was born in Hawaii born born no relationObama grew up in Hawaii born lived in bornObama Bear visits Hawaii no relation born employee ofPresident Obama . . . title title titleObama employed president . . . employee of title employee of

Uniform: Often annotates easy sentences.High JS (disagreement): More likely to annotate “rare” sentences.

Sample JS (disagreement): Mix of hard and representative sentences.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 12 / 19

Page 42: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Example Selection Criteria

Committee Member Judgments

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

y1 y2 . . . yn−1 yn

z2

x2

z1

x1

z3

x3

Sentence Member A Member B Member CObama was born in Hawaii born born no relationObama grew up in Hawaii born lived in bornObama Bear visits Hawaii no relation born employee ofPresident Obama . . . title title titleObama employed president . . . employee of title employee of

Uniform: Often annotates easy sentences.High JS (disagreement): More likely to annotate “rare” sentences.Sample JS (disagreement): Mix of hard and representative sentences.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 12 / 19

Page 43: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Experiments

Recall our questions:

Is the proposed criterion better than other methods?

Where is the supervision helping?

How far can we get with a supervised classifier?

Two experimental setups:

Slot filling evaluation of Surdeanu et al. (2012).

Stanford’s 2013 TAC-KBP slot filling system

Bonus: 4.4 F1 improvement on 2014 TAC-KBP competition

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 13 / 19

Page 44: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Experiments

Recall our questions:

Is the proposed criterion better than other methods?

Where is the supervision helping?

How far can we get with a supervised classifier?

Two experimental setups:

Slot filling evaluation of Surdeanu et al. (2012).

Stanford’s 2013 TAC-KBP slot filling system

Bonus: 4.4 F1 improvement on 2014 TAC-KBP competition

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 13 / 19

Page 45: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Experiments

Recall our questions:

Is the proposed criterion better than other methods?

Where is the supervision helping?

How far can we get with a supervised classifier?

Two experimental setups:

Slot filling evaluation of Surdeanu et al. (2012).

Stanford’s 2013 TAC-KBP slot filling system

Bonus: 4.4 F1 improvement on 2014 TAC-KBP competition

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 13 / 19

Page 46: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Old News: MIML-RE Works Well

Slot filling evaluation of Surdeanu et al. (2012).

0.2

0.3

0.4

0.5

0.6

0.7

0 0.05 0.1 0.15 0.2 0.25 0.3

Pre

cisi

on

Recall

Distant Supervision

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 14 / 19

Page 47: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Old News: MIML-RE Works Well

Slot filling evaluation of Surdeanu et al. (2012).

0.2

0.3

0.4

0.5

0.6

0.7

0 0.05 0.1 0.15 0.2 0.25 0.3

Pre

cisi

on

Recall

Surdeanu et al. (2012)Distant Supervision

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 14 / 19

Page 48: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Old News: MIML-RE Works Well

Slot filling evaluation of Surdeanu et al. (2012).

0.2

0.3

0.4

0.5

0.6

0.7

0 0.05 0.1 0.15 0.2 0.25 0.3

Pre

cisi

on

Recall

MIML-RE (our baseline)Surdeanu et al. (2012)

Distant Supervision

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 14 / 19

Page 49: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Active learning is important; SampleJS performs well.

Slot filling evaluation of Surdeanu et al. (2012).

0.2

0.3

0.4

0.5

0.6

0.7

0 0.05 0.1 0.15 0.2 0.25 0.3

Pre

cisi

on

Recall

MIML-RE (our baseline)

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 15 / 19

Page 50: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Active learning is important; SampleJS performs well.

Slot filling evaluation of Surdeanu et al. (2012).

0.2

0.3

0.4

0.5

0.6

0.7

0 0.05 0.1 0.15 0.2 0.25 0.3

Pre

cisi

on

Recall

UniformMIML-RE (our baseline)

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 15 / 19

Page 51: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Active learning is important; SampleJS performs well.

Slot filling evaluation of Surdeanu et al. (2012).

0.2

0.3

0.4

0.5

0.6

0.7

0 0.05 0.1 0.15 0.2 0.25 0.3

Pre

cisi

on

Recall

High JSUniform

MIML-RE (our baseline)

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 15 / 19

Page 52: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Active learning is important; SampleJS performs well.

Slot filling evaluation of Surdeanu et al. (2012).

0.2

0.3

0.4

0.5

0.6

0.7

0 0.05 0.1 0.15 0.2 0.25 0.3

Pre

cisi

on

Recall

Sample JSHigh JS

UniformMIML-RE (our baseline)

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 15 / 19

Page 53: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Active learning is important; SampleJS performs well.

Slot filling evaluation of Surdeanu et al. (2012).

0.2

0.3

0.4

0.5

0.6

0.7

0 0.05 0.1 0.15 0.2 0.25 0.3

Pre

cisi

on

Recall

Sample JSMIML-RE (our baseline)

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 15 / 19

Page 54: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Active learning is important; SampleJS performs well.

Slot filling evaluation of Surdeanu et al. (2012).

0.2

0.3

0.4

0.5

0.6

0.7

0 0.05 0.1 0.15 0.2 0.25 0.3

Pre

cisi

on

Recall

Sample JSMIML-RE (our baseline)

Distant Supervision

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 15 / 19

Page 55: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

SampleJS performs best on TAC-KBP challenge.

TAC-KBP 2013 Slot Filling Challenge:

End-to-end task – includes IR + consistency.

Precision: facts LDC evaluators judged as correct.Recall: facts other teams (including LDC annotators) also found.

System P R F1

No active learning 38.0 30.5 33.8Sample uniformly 34.4 35.0 34.7Highest JS disagreement 46.2 30.8 37.0Sample JS disagreement 39.4 36.2 37.7

2014 TAC-KBP Slot Filling Challenge: 27.6→ 32.0 F1.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 15 / 19

Page 56: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

SampleJS performs best on TAC-KBP challenge.

TAC-KBP 2013 Slot Filling Challenge:

End-to-end task – includes IR + consistency.

Precision: facts LDC evaluators judged as correct.Recall: facts other teams (including LDC annotators) also found.

System P R F1

No active learning 38.0 30.5 33.8Sample uniformly 34.4 35.0 34.7Highest JS disagreement 46.2 30.8 37.0Sample JS disagreement 39.4 36.2 37.7

2014 TAC-KBP Slot Filling Challenge: 27.6→ 32.0 F1.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 15 / 19

Page 57: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

SampleJS performs best on TAC-KBP challenge.

TAC-KBP 2013 Slot Filling Challenge:

End-to-end task – includes IR + consistency.

Precision: facts LDC evaluators judged as correct.Recall: facts other teams (including LDC annotators) also found.

System P R F1

No active learning 38.0 30.5 33.8

Sample uniformly 34.4 35.0 34.7Highest JS disagreement 46.2 30.8 37.0Sample JS disagreement 39.4 36.2 37.7

2014 TAC-KBP Slot Filling Challenge: 27.6→ 32.0 F1.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 15 / 19

Page 58: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

SampleJS performs best on TAC-KBP challenge.

TAC-KBP 2013 Slot Filling Challenge:

End-to-end task – includes IR + consistency.

Precision: facts LDC evaluators judged as correct.Recall: facts other teams (including LDC annotators) also found.

System P R F1

No active learning 38.0 30.5 33.8Sample uniformly 34.4 35.0 34.7

Highest JS disagreement 46.2 30.8 37.0Sample JS disagreement 39.4 36.2 37.7

2014 TAC-KBP Slot Filling Challenge: 27.6→ 32.0 F1.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 15 / 19

Page 59: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

SampleJS performs best on TAC-KBP challenge.

TAC-KBP 2013 Slot Filling Challenge:

End-to-end task – includes IR + consistency.

Precision: facts LDC evaluators judged as correct.Recall: facts other teams (including LDC annotators) also found.

System P R F1

No active learning 38.0 30.5 33.8Sample uniformly 34.4 35.0 34.7Highest JS disagreement 46.2 30.8 37.0

Sample JS disagreement 39.4 36.2 37.7

2014 TAC-KBP Slot Filling Challenge: 27.6→ 32.0 F1.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 15 / 19

Page 60: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

SampleJS performs best on TAC-KBP challenge.

TAC-KBP 2013 Slot Filling Challenge:

End-to-end task – includes IR + consistency.

Precision: facts LDC evaluators judged as correct.Recall: facts other teams (including LDC annotators) also found.

System P R F1

No active learning 38.0 30.5 33.8Sample uniformly 34.4 35.0 34.7Highest JS disagreement 46.2 30.8 37.0Sample JS disagreement 39.4 36.2 37.7

2014 TAC-KBP Slot Filling Challenge: 27.6→ 32.0 F1.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 15 / 19

Page 61: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

SampleJS performs best on TAC-KBP challenge.

TAC-KBP 2013 Slot Filling Challenge:

End-to-end task – includes IR + consistency.

Precision: facts LDC evaluators judged as correct.Recall: facts other teams (including LDC annotators) also found.

System P R F1

No active learning 38.0 30.5 33.8Sample uniformly 34.4 35.0 34.7Highest JS disagreement 46.2 30.8 37.0Sample JS disagreement 39.4 36.2 37.7

2014 TAC-KBP Slot Filling Challenge: 27.6→ 32.0 F1.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 15 / 19

Page 62: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Good initialization is more important than constraining EM.

Is initialization or fixing latent zs during EM helping?

What if we initialize with distant supervision?

0.2

0.3

0.4

0.5

0.6

0.7

0 0.05 0.1 0.15 0.2 0.25 0.3

Pre

cisi

on

Recall

MIML-RE (our baseline)

Hypothesis: Supervision not only smooths the objective but provides betterinitialization for the non-convex objective.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 16 / 19

Page 63: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Good initialization is more important than constraining EM.

Is initialization or fixing latent zs during EM helping?

What if we initialize with distant supervision?

0.2

0.3

0.4

0.5

0.6

0.7

0 0.05 0.1 0.15 0.2 0.25 0.3

Pre

cisi

on

Recall

UniformMIML-RE (our baseline)

Hypothesis: Supervision not only smooths the objective but provides betterinitialization for the non-convex objective.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 16 / 19

Page 64: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Good initialization is more important than constraining EM.

Is initialization or fixing latent zs during EM helping?

What if we initialize with distant supervision?

0.2

0.3

0.4

0.5

0.6

0.7

0 0.05 0.1 0.15 0.2 0.25 0.3

Pre

cisi

on

Recall

High JSUniform

MIML-RE (our baseline)

Hypothesis: Supervision not only smooths the objective but provides betterinitialization for the non-convex objective.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 16 / 19

Page 65: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Good initialization is more important than constraining EM.

Is initialization or fixing latent zs during EM helping?

What if we initialize with distant supervision?

0.2

0.3

0.4

0.5

0.6

0.7

0 0.05 0.1 0.15 0.2 0.25 0.3

Pre

cisi

on

Recall

Sample JSHigh JS

UniformMIML-RE (our baseline)

Hypothesis: Supervision not only smooths the objective but provides betterinitialization for the non-convex objective.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 16 / 19

Page 66: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Good initialization is more important than constraining EM.

Is initialization or fixing latent zs during EM helping?

What if we initialize with distant supervision?

0.2

0.3

0.4

0.5

0.6

0.7

0 0.05 0.1 0.15 0.2 0.25 0.3

Pre

cisi

on

Recall

Sample JSHigh JS

UniformMIML-RE (our baseline)

Hypothesis: Supervision not only smooths the objective but provides betterinitialization for the non-convex objective.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 16 / 19

Page 67: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

A supervised classifier performs surprisingly well.

TAC-KBP 2013 Slot Filling Challenge:

End-to-end task – includes IR + consistency.

Precision: facts LDC evaluators judged as correct.Recall: facts other teams (including LDC annotators) also found.

System P R F1

MIML-RE (baseline) 38.0 30.5 33.8

Supervised from SampleJS 33.5 35.0 34.2MIML-RE Supervised init. 35.1 35.6 35.5SampleJS 39.4 36.2 37.7

A bit circular: Need MIML-RE to get supervised examples.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 17 / 19

Page 68: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

A supervised classifier performs surprisingly well.

TAC-KBP 2013 Slot Filling Challenge:

End-to-end task – includes IR + consistency.

Precision: facts LDC evaluators judged as correct.Recall: facts other teams (including LDC annotators) also found.

System P R F1

MIML-RE (baseline) 38.0 30.5 33.8Supervised from SampleJS 33.5 35.0 34.2

MIML-RE Supervised init. 35.1 35.6 35.5SampleJS 39.4 36.2 37.7

A bit circular: Need MIML-RE to get supervised examples.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 17 / 19

Page 69: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

A supervised classifier performs surprisingly well.

TAC-KBP 2013 Slot Filling Challenge:

End-to-end task – includes IR + consistency.

Precision: facts LDC evaluators judged as correct.Recall: facts other teams (including LDC annotators) also found.

System P R F1

MIML-RE (baseline) 38.0 30.5 33.8Supervised from SampleJS 33.5 35.0 34.2MIML-RE Supervised init. 35.1 35.6 35.5SampleJS 39.4 36.2 37.7

A bit circular: Need MIML-RE to get supervised examples.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 17 / 19

Page 70: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

A supervised classifier performs surprisingly well.

TAC-KBP 2013 Slot Filling Challenge:

End-to-end task – includes IR + consistency.

Precision: facts LDC evaluators judged as correct.Recall: facts other teams (including LDC annotators) also found.

System P R F1

MIML-RE (baseline) 38.0 30.5 33.8Supervised from SampleJS 33.5 35.0 34.2MIML-RE Supervised init. 35.1 35.6 35.5SampleJS 39.4 36.2 37.7

A bit circular: Need MIML-RE to get supervised examples.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 17 / 19

Page 71: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

A Case for Supervised Classifiers

Stanford’s KBP system Supervised Classifier(artist rendition) (150 lines + featurizer)

Annotating examples: $1330Flight to Qatar: $1027Apple 27” Screen: $999

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 18 / 19

Page 72: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

A Case for Supervised Classifiers

Stanford’s KBP system Supervised Classifier(artist rendition) (150 lines + featurizer)

Annotating examples: $1330Flight to Qatar: $1027Apple 27” Screen: $999

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 18 / 19

Page 73: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

A Case for Supervised Classifiers

Stanford’s KBP system Supervised Classifier(artist rendition) (150 lines + featurizer)

Annotating examples: $1330

Flight to Qatar: $1027Apple 27” Screen: $999

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 18 / 19

Page 74: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

A Case for Supervised Classifiers

Stanford’s KBP system Supervised Classifier(artist rendition) (150 lines + featurizer)

Annotating examples: $1330Flight to Qatar: $1027Apple 27” Screen: $999

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 18 / 19

Page 75: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Conclusions

Things you can use:

New active learning criterion: Sampling disagreement between acommittee of classifiers.

Corpus of supervised examples for TAC-KBP relations.

4.4 F1 improvement on 2014 KBP Slot Filling.

Things we’ve learned:

Example selection is very important for performance.

MIML-RE is sensitive to initialization.

Supervised classifiers can perform similarly to distantly supervisedmethods.

Thank You!

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 19 / 19

Page 76: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Conclusions

Things you can use:

New active learning criterion: Sampling disagreement between acommittee of classifiers.

Corpus of supervised examples for TAC-KBP relations.

4.4 F1 improvement on 2014 KBP Slot Filling.

Things we’ve learned:

Example selection is very important for performance.

MIML-RE is sensitive to initialization.

Supervised classifiers can perform similarly to distantly supervisedmethods.

Thank You!

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 19 / 19

Page 77: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Conclusions

Things you can use:

New active learning criterion: Sampling disagreement between acommittee of classifiers.

Corpus of supervised examples for TAC-KBP relations.

4.4 F1 improvement on 2014 KBP Slot Filling.

Things we’ve learned:

Example selection is very important for performance.

MIML-RE is sensitive to initialization.

Supervised classifiers can perform similarly to distantly supervisedmethods.

Thank You!

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 19 / 19

Page 78: Combining Distant and Partial Supervision for Relation ...angeli/talks/2014-emnlp-activelearning.pdf · c=1 KL(pc (z)jjpmean z)). 4 We have measure of disagreement for each z. Three

Comparison to Pershina et al. (ACL 2014)

Slot filling evaluation of Surdeanu et al. (2012).

0.2

0.3

0.4

0.5

0.6

0.7

0 0.05 0.1 0.15 0.2 0.25 0.3

Pre

cisi

on

Recall

Sample JSPershina et al. (2014)

MIML-RE

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision . . . October 28, 2014 19 / 19