improved relation extraction with feature-rich ... · improved relation extraction with...

Improved Relation Extraction with Feature-Rich Compositional

Embedding Models

September 21, 2015EMNLP

Mo Yu* Matt Gormley*

Mark Dredze

*Co-first authors

FCM or: How I Learned to Stop Worrying (about Deep Learning)

and Love Features

September 21, 2015EMNLP

Mo Yu* Matt Gormley*

Mark Dredze

*Co-first authors

Handcrafted Features

NNP : VBN NNP VBD

PERLOC

Egypt - born Proyas directed

ADJP VPNP

egypt - born proyas direct

p(y|x) ∝exp(Θyf( ))

born-in

Where do features come from?

Feature Learning

hand-craftedfeatures

Sun et al., 2011

Zhou et al.,2005

First word before M1

Second word before M1

Bag-of-words in M1

Head word of M1

Other word in between

First word after M2

Second word after M2

Bag-of-words in M2

Head word of M2

Bigrams in between

Words on dependency path

Country name list

Personal relative triggers

Personal title list

WordNet Tags

Heads of chunks in between

Path of phrase labels

Combination of entity types

Feature Learning

Sun et al., 2011

Zhou et al.,2005 word

embeddingsMikolov et al.,

CBOW model in Mikolov et al. (2013)

input(context words)

embedding

missing word

Look-up table Classifier

0.13 .26 … -.52

0.11 .23 … -.45

cat:similar words,similar embeddings

unsupervisedlearning

Feature Learning

Sun et al., 2011

stringembeddings

Collobert & Weston, 2008

Socher, 2011

Convolutional Neural Networks (Collobert and Weston 2008)

The [movie] showed [wars]

pooling

Recursive Auto Encoder (Socher 2011)

Feature Learning

Sun et al., 2011

treeembeddings

Socher et al.,2013

Hermann & Blunsom,2013

stringembeddings

Socher, 2011

WNP,VP

WDT,NN WV,NN

wordembeddings

treeembeddings

stringembeddings

Feature Learning

Sun et al., 2011

Zhou et al.,2005

Mikolov et al.,2013

Socher, 2011

Socher et al.,2013

Hermann et al.2014

word embedding features

Turian et al. 2010

Koo et al. 2008

wordembeddings

treeembeddings

word embedding featureshand-crafted

features

Our model (FCM)

stringembeddings

Feature Learning

Sun et al., 2011

Zhou et al.,2005

Mikolov et al.,2013

Socher, 2011

Socher et al.,2013

Turian et al. 2010

Koo et al. 2008

Hermann et al.2014

Feature-rich Compositional Embedding Model (FCM)

Goals for our Model:

1. Incorporate semantic/syntactic structural information

2. Incorporate word meaning

3. Bridge the gap between feature engineering and feature learning – but remain as simple as possible

11The [movie]M1 I watched depicted [hope]M2

nilnoun-other

noun-person

noun-other

verb-percep.

verb-comm.

on-path(wi)

is-between(wi)

head-of-M1(wi)

head-of-M2(wi)

before-M1(wi)

before-M2(wi)

f1 f2 f3 f4 f5 f6

Per-word Features:

nilnoun-other

noun-person

noun-other

verb-percep.

verb-comm.

on-path(wi)

is-between(wi)

head-of-M1(wi)

head-of-M2(wi)

before-M1(wi)

before-M2(wi)

Per-word Features:

nilnoun-other

noun-person

noun-other

verb-percep.

verb-comm.

on-path(wi) & wi= “depicted”

is-between(wi) & wi= “depicted”

head-of-M1(wi) & wi= “depicted”

head-of-M2(wi) & wi= “depicted”

before-M1(wi) & wi= “depicted”

before-M2(wi) & wi= “depicted”

Per-word Features: (with conjunction)

nilnoun-other

noun-person

noun-other

verb-percep.

verb-comm.

Per-word Features: (with soft conjunction)

on-path(wi)

is-between(wi)

head-of-M1(wi)

head-of-M2(wi)

before-M1(wi)

before-M2(wi)

-.3 .9 .1 -1

edepicted

Outer-product

nilnoun-other

noun-person

noun-other

verb-percep.

verb-comm.

Per-word Features: (with soft conjunction)

on-path(wi)

is-between(wi)

head-of-M1(wi)

head-of-M2(wi)

before-M1(wi)

before-M2(wi)

-.3 .9 .1 -1

edepicted

-.3 .9 .1 -1

0 0 0 0

-.3 .9 .1 -1

… … … …

p(y|x) ∝ exp Σi=1

Our full model sums over each word in the sentence

Then takes the dot-product with a parameter tensor

And finally, exponentiatesand renormalizes

Features for FCM

• Let M1 and M2 denote the left and right entity mentions

• Our per-word Binary Features: head of M1

head of M2

in-between M1 and M2

-2, -1, +1, or +2 of M1

-2, -1, +1, or +2 of M2

on dependency path between M1 and M2

• Optionally: Add the entity type of M1, M2, or both

FCM as a Neural Network

p(y|x)

��

Binary features

Embeddings

Parameter tensor

• Embeddings are (optionally) treated as model parameters• A log-bilinear model• We initialize, then fine-tune the embeddings

Baseline Model

NNP : VBN NNP VBD

PERLOC

Egypt- bornProyasdirected

ADJP VPNP

egypt- bornproyasdirect

born-in

p(y|x) µ

exp(Θy�f ( ) )

• Multinomial logistic regression (standard approach)

• Bring in all the usual binary NLP features (Sun et al., 2011)– type of the left entity mention

– dependency path between mentions

– bag of words in right mention

– …

Hybrid Model: Baseline + FCM

p(y|x)

��

NNP : VBN NNP VBD

PERLOC

Egypt- bornProyasdirected

ADJP VPNP

egypt- bornproyasdirect

born-in

p(y|x) µ

exp(Θy�f ( ) )

p(y|x) = pBaseline(y|x) pFCM(y|x)1Z(x)

Product of Experts:

Experimental Setup

ACE 2005

• Data: 6 domains– Newswire (nw)– Broadcast Conversation (bc)– Broadcast News (bn)– Telephone Speech (cts)– Usenet Newsgroups (un)– Weblogs (wl)

• Train: bn+nw (~3600 relations)Dev: ½ of bcTest: ½ of bc, cts, wl`

• Metric: Micro F1(given entity mention)

SemEval-2010 Task 8

• Data: Web text– Newswire (nw)– Broadcast Conversation (bc)– Broadcast News (bn)– Telephone Speech (cts)– Usenet Newsgroups (un)– Weblogs (wl)

• Train:Dev: Test:

• Metric: Macro F1(given entity boundaries)

Standard split from shared task

ACE 2005 Results

Broadcast Conversation Conversational TelephoneSpeech

Weblogs

Test Set

Baseline

Baseline+FCM

SemEval-2010 ResultsSource Classifier F1

Socher et al. (2012) RNN 74.8

Socher et al. (2012) MVRNN 79.1

Hashimoto et al. (2015) RelEmb 81.8

Rink and Harabagiu (2010) SVM 82.2

Zeng et al. (2014) CNN 82.7

Santos et al. (2015) CR-CNN (log-loss) 82.7

Liu et al. (2015) DepNN 82.8

Hashimoto et al. (2015) RelEmb (task-spec-emb) 82.8

70 72 74 76 78 80

Best in SemEval-2010 Shared Task

FCM (log-linear) 81.4

FCM (log-bilinear) 83.0

70 72 74 76 78 80

Best in SemEval-2010 Shared Task

70 72 74 76 78 80

Xu et al. (2015) SDP-LSTM 82.4

Xu et al. (2015) SDP-LSTM (full) 83.7

70 72 74 76 78 80

FCM (log-bilinear)(task-spec-emb)

70 72 74 76 78 80

Santos et al. (2015) CR-CNN (ranking-loss) 84.1

FCM (log-bilinear)(task-spec-emb)

70 72 74 76 78 80

Takeaways

FCM bridges the gap between feature engineering and feature learning

If you are allergic to deep learning:– Try the FCM for your task: it is simple, easy-to-

implement, and was shown to be effective for two relation benchmarks

If you are a deep learning expert:– Inject the FCM (i.e. outer product of features and

embeddings) into your fancy deep network

Questions?

Two open source implementations:

– Java: (Within the Pacaya framework)https://github.com/mgormley/pacaya

– C++: (From our NAACL 2015 paper on LRFCM)https://github.com/Gorov/ERE_RE

improved relation extraction with feature-rich ... · improved relation extraction with...

Documents

improved relation extraction with feature-rich...

exploiting background knowledge for relation extraction

recon: relation extraction using knowledge graph context

relation extraction - sameer...

measuring crowd truth for medical relation extraction

relation extraction and machine learning for ie feiyu xu...

relation extraction - github...

definitional verbal patterns for semantic relation...

fine-grained geographical relation extraction from wikipedia

lecture 14 relation extraction

09 relation extraction and scoring

relation extraction - lmu

information extraction: coreference and relation...

improved relation extraction with feature-rich ... ·...

improving chemical disease relation extraction with rich

improving neural relation extraction with implicit mutual

relation extraction from wikipedia...

unsupervised relation extraction for e-learning applications

crowdtruth for medical relation extraction - wai talk

relation extraction - courses.cs.washington.edu · relation...