probabilistic reasoning via deep learning: neural ...alanwags/dlai2016/(liu+) ijcai-16...

28
Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liu University of Science and Technology of China Joint work with Hui Jiang , Zhen-Hua Ling , Si Wei § and Yu Hu § York University, Canada § iFLYTEK Research, Hefei, China July 10, 2016 Quan Liu (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 1 / 28

Upload: others

Post on 15-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

Probabilistic Reasoning via Deep Learning:Neural Association Models

Quan Liu†

†University of Science and Technology of China

Joint work with Hui Jiang‡, Zhen-Hua Ling†, Si Wei§ and Yu Hu§‡York University, Canada

§iFLYTEK Research, Hefei, China

July 10, 2016

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 1 / 28

Page 2: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

Outline

1 Neural Association Model (NAM)MotivationModelExperiments

2 NAM for Winograd SchemasWinograd SchemasData CollectionNAM for Winograd Schemas

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 2 / 28

Page 3: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

Neural Association Model

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 3 / 28

Page 4: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

1. Motivation

Neural Association Model⇑

Main work

Motivation: Neural Model to Associate between Events

Events emerge everywhere (→ massive) in our diary life.

Events are discrete (→ sparse).

Commonsense reasoning relies on the Association between Events.

Association relationships

Causality, Temporal, Taxonomy, Entailment, etc.

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 4 / 28

Page 5: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

Examples

What are the possible events Associated with event “Play basketball”?

play basketball

win

injured

make money

be coacheddrink waterstock trading

Association 6= Classification!

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 5 / 28

Page 6: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

Motivation: Main Method

Neural Association Model: a neural model for probabilistic reasoning

Associating two events via deep learning techniques:

Predicting the conditional association probability Pr(E2|E1) of twodifferent events, E1 and E2.

Application E1 E2

Causal-Effect reasoning cause effectRecognize lexical entailment W1 W2

Recognize textual entailment D1 D2

Language modeling h wKnowledge link prediction (ei, rk) ej

E.g. Causal-Effect reasoning

E1 = cause event

E2 = effect event

How likely E2 is caused by E1?

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 6 / 28

Page 7: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

Advantages vs. Disadvantages

Advantages of NNs for reasoning

Neural networks make universal approximation (Hornik et al., 1990).

Linear models can hardly do this.Nickel, Murphy et al. (2015)

Associating in continuous spaces improve scalability.

Graphical models suffer from the scalability issue.Jensen (1996); Richardson and Domingos (2006)

Disadvantages

Deep learning need big data, i.e., KBs.

Automated Knowledge AcquisitionTransfer Learning

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 7 / 28

Page 8: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

2. Neural Association Model

A neural model for modeling the association probability of two events.

Vector space

Event E1

Vector space

Event E2

Deep Neural Networks

Association in DNNsPr(E2|E1)

Key modules

Representation: Represent discrete events into continuous vectors

Association: Predict the association probability via deep learning

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 8 / 28

Page 9: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

Association via DNNs

Distributed representations

All discrete events are represented in continuous vector spaces.

Two model structures for Association1 Deep Neural Networks (DNN)

2 Relation-modulated Neural Networks (RMNN)

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 9 / 28

Page 10: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

2.1 Deep Neural Networks

Deep Neural Networks (DNN)

Associating two events through deep neural networks

For a multi-relation data xn = (ei, rk, ej):

Entity vector: ei → v(1)i , ej → v

(2)j (Different embedding matrices)

Relation code: rk → ck

0

Relation code (S) Head entity vector

Tail entity vector

f Score function

W(1)

W(2)

W(L)

……

……

out: z(L)

In: a(L)

out: z(2)

In: a(2)

out: z(1)

In: a(1)

B(1)

B(2)

B(L)

B(L+1)

0

New Relation Head entity vector

f

W(1)

W(2)

W(L)

……

……

B(1)

B(2)

B(L)

Tail entity vector

S V(head) V(head)

B(L+1)

0

Existed Relations Head entity vector

f

W(1)

W(2)

W(L)

……

……

B(1)

B(2)

B(L)

Tail entity vector

S V(head) V(head)

B(L+1)

Transfering

Vector

space

Event E1

Vector

space

Event E2

Deep Neural Networks

Association in DNNs

P(E2|E1)

Relation vector Head entity vector

Tail entity vector

fAssociation at here

W(1)

W(2)

W(L)

out: z(L)

In: a(L)

out: z(2)

In: a(2)

out: z(1)

In: a(1)

Head entity vector

Tail entity vector

f Association at here

W(1)

W(2)

W(L) …

out: z(L)

In: a(L)

out: z(2)

In: a(2)

out: z(1)

In: a(1)

B(1)

B(2)

B(L)

B(L+1)

Relation vector

z(0) = [v(1)i , ck]

a(`) = W(`)z(`−1) + b`, ` = 1...L,

ReLU hidden layer activation:z(`) = max

(0,a(`)

), ` = 1...L,

The associative probability:

f(xn;Θ) = σ(z(L) · v(2)

j

),

σ(x) = 1/(1 + e−x).

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 10 / 28

Page 11: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

2.2 Relation-modulated Neural Networks

Relation-modulated Neural Networks (RMNN)

Improved over DNN

Define and connect relation codes to all the layers of DNN

0

Relation code (S) Head entity vector

Tail entity vector

f Score function

W(1)

W(2)

W(L)

……

……

out: z(L)

In: a(L)

out: z(2)

In: a(2)

out: z(1)

In: a(1)

B(1)

B(2)

B(L)

B(L+1)

0

New Relation Head entity vector

f

W(1)

W(2)

W(L)

……

……

B(1)

B(2)

B(L)

Tail entity vector

S V(head) V(head)

B(L+1)

0

Existed Relations Head entity vector

f

W(1)

W(2)

W(L)

……

……

B(1)

B(2)

B(L)

Tail entity vector

S V(head) V(head)

B(L+1)

Transfering

Vector

space

Event E1

Vector

space

Event E2

Deep Neural Networks

Association in DNNs

P(E2|E1)

Relation vector Head entity vector

Tail entity vector

fAssociation at here

W(1)

W(2)

W(L)

out: z(L)

In: a(L)

out: z(2)

In: a(2)

out: z(1)

In: a(1)

Head entity vector

Tail entity vector

f Association at here

W(1)

W(2)

W(L) …

out: z(L)

In: a(L)

out: z(2)

In: a(2)

out: z(1)

In: a(1)

B(1)

B(2)

B(L)

B(L+1)

Relation vector

z(0) = [v(1)i , ck]

a(`) = W(`)z(`−1) + B(`)c(k), ` =1...L,

ReLU hidden layer activation:z(`) = max

(0,a(`)

), ` = 1...L,

The associative probability:f(xn;Θ) =

σ(z(L) · v(2)

j + B(L+1) · c(k)).

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 11 / 28

Page 12: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

NAM: Final Training Objectives

Training sample: event pair x = (E1, E2); score: f(x;Θ) = Pr(E2|E1)

Training objective

For each positive sample x+n and negative sample x−n , To minimize:

Q(Θ) =−∑

x+n∈D+

ln f(x+n ;Θ)−∑

x−n ∈D−ln(1− f(x−n ;Θ))

(1)

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 12 / 28

Page 13: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

3. Experiments

Experiments

Recognizing textual entailment

Commonsense reasoning

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 13 / 28

Page 14: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

3.1 Recognizing Textual Entailment (RTE)

Recognizing Textual Entailment

Recognizing the entailment relationship between two sentences

Premise: “The man was assassinated.”Hypothesis: “The man is dead.”

Datasets

The Stanford Natural Language Inference (SNLI) Corpus

Experiments: 2-class recognition

Model Accuracy (%)Edit Distance Based 71.9Classifier Based 72.2With Lexical Resources 75.0Neural Association Model 84.7

NAM model performs better than many traditional methods.

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 14 / 28

Page 15: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

3.2 Commonsense Reasoning

Commonsense reasoning

Task investigated in this work

Answering simple commonsense questionsJudge the truth of commonsense triples

“Is a camel capable of journey across desert?”Triple: (camel, capable of, journey across desert).

Datasets

From ConceptNet 5, a commonsense KB (Speer and Havasi 2012).http://conceptnet5.media.mit.edu/

We extract 14 popular commonsense relations (CN14).

Dataset #Rel #Entities # Train # Dev # TestCN14 14 159,135 200,198 5,000 10,000

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 15 / 28

Page 16: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

Results

Overall results on CN14

Model Accuracy (%)DNN 85.7

RMNN 86.1

Results on different relations

46416

37693

2705424789

1992918405

16020

10709

5160 50383040

499 280 166

# n

um

ber

of

trip

les

Relation in ConceptNet KB60 65 70 75 80 85 90 95 100

UsedFor

CapableOf

HasSubevent

HasPrerequisite

HasProperty

Causes

MotivatedByGoal

ReceivesAction

CausesDesire

Desires

HasLastSubevent

CreatedBy

DesireOf

SymbolOf

Socher et al.,

2013

RMNNs

NAM shows some potentials for commonsense reasoning.

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 16 / 28

Page 17: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

Application:NAM for Winograd Schemas

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 17 / 28

Page 18: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

Winograd schemas

Typical Winograd schemas example

Co-reference cannot be resolved without commonsense knowledge.

Statement: Marry made sure to thank Susan for all the help she hadreceived.

Q: who had received the help?

Answer: Marry

Commonsense knowledge: receive help → thank

⇓Association between Events:

Pr(thank|receive help) > Pr(thank|give help)

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 18 / 28

Page 19: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

NAM for Winograd Schemas

Modules for Solving Winograd Schemas

Neural Association Model

Data Collection: how to collect training data for NAM?

System framework for data collection

Text corpus

Vocab

Sentences

Results

“rob”

Active, Positive

Active, Negative

Passive, Positive

Passive, Negative

“arrest”

Active, Positive

Active, Negative

Passive, Positive

Passive, Negative

Association Links

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 19 / 28

Page 20: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

1. Data Collection

Query Search in Text Corpus

Search query: keyword pairs formed from a common vocabulary.

Vocabulary: 7500 common verbs and adjectives.E.g. (arrest ... because ... rob); (decide ... because ... explain)

Each word/phrases have 4 variations → 16 patterns for each query.

Text corpus

Vocab

Sentences

Results

“rob”

Active, Positive

Active, Negative

Passive, Positive

Passive, Negative

“arrest”

Active, Positive

Active, Negative

Passive, Positive

Passive, Negative

Association Links

We want to gather the number of active association links.Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 20 / 28

Page 21: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

1. Data Collection

Association knowledge from dependency parsing

Subject/Object Matching ⇒ Assigning Association links

Collect the number of active links

“He was arrested because he robbed the man.”(he, nsubjpass, arrest), (he, nsubj, rob)

“rob” and “arrest” share a same subject “he”

“nsubjpass” ⇒ passive

“rob” ⇒ “be arrested”

Text corpus

Vocab

Sentences

Results

“rob”

Active, Positive

Active, Negative

Passive, Positive

Passive, Negative

“arrest”

Active, Positive

Active, Negative

Passive, Positive

Passive, Negative

Association Links

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 21 / 28

Page 22: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

Data collection results

Copora for data collectionBookCorpus (Zhu et al., 2015)

CBTest corpus (Hill et al., 2016)

Gigaword 5 (Parker, Robert, et al., 2011)

Results: highly associated pairs

We extracted about 100,000 highly associated pairs.

(know ⇒ clear), (believe ⇒ not disagree), (be released ⇒ not hold).

Typical PMI distributions

-6 -4 -2 0 2 4 6 8 10 12 140

100

200

300

400

500

600Dimen-4 (num:31541,mean:2.6677,var:5.8094)

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 22 / 28

Page 23: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

2. NAM for Winograd Schemas

NAM RelationCode: Treat the 16 dimensions as distinct relations

cause effectNeural

Association Model

cause

effectNeural

Association Model

relation

Transform Transform

NAM TransMatrix: Do linear transformation for each word/phrases

cause effectNeural

Association Model

cause

effectNeural

Association Model

relation

Transform Transform

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 23 / 28

Page 24: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

2. NAM for Winograd Schmeas

Datasets

From http://www.cs.nyu.edu/faculty/davise/papers/WS.html

We labelled 70 schemas related to cause effect reasoning.

Available at http://home.ustc.edu.cn/~quanliu/

Results

We now achieve 61.4% accuracy on the Winograd CE datasets.

Model Accuracy (%)NAM TransMatrix 59.6NAM RelationCode 61.4

Table: Performance of NAM.

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 24 / 28

Page 25: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

Answering examples

“tasty” → “be eaten”

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 25 / 28

Page 26: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

Answering examples

“hungry” → “eat”

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 26 / 28

Page 27: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

Future works

Data level

Collect more useful data for commonsense reasoning

Automatic construction from text/KBs

Human labelling

Model level

Toward more complex probabilistic reasoning problems

Neural association model for transfer learning

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 27 / 28

Page 28: Probabilistic Reasoning via Deep Learning: Neural ...alanwags/DLAI2016/(Liu+) IJCAI-16 DLAI...Probabilistic Reasoning via Deep Learning: Neural Association Models Quan Liuy yUniversity

Thank You!(Q&A)

Quan Liu† (Univ. Sci.&Tech. China) Neural Association Model July 10, 2016 28 / 28