efficient probabilistic logic reasoning with graph neural

Efficient Probabilistic Logic Reasoning with Graph Neural Networks

Le Song

Associate Professor, CSE, College of ComputingAssociate Director, Center for Machine Learning

Georgia Institute of Technology

Learning with small data• Combine learning and logic reasoning

• Learning: generalize with noisy input• Logic/symbolic reasoning: predict without data

Data requirement

Noise Tolerance Deep

learning

Symbolic

Desired

Example of logic inference: HorseShape(c) ∧ StripePattern(c) ⇒ Zebra(c)

⇒Zebra

Graph neural networks (GNN/Structure2vec)• Obtain embedding via iterative updates:

1. Initialize 𝜇𝜇𝑖𝑖(0) = ℎ 𝑊𝑊1𝑋𝑋𝑖𝑖 ,∀ 𝑖𝑖

2. Iterate 𝑇𝑇 times

𝜇𝜇𝑖𝑖(𝑡𝑡) ← ℎ 𝑊𝑊1𝜇𝜇𝑖𝑖

(𝑡𝑡−1) + 𝑊𝑊2 �𝑗𝑗∈𝒩𝒩 𝑖𝑖

𝜇𝜇𝑗𝑗(𝑡𝑡−1)

neural network

𝑋𝑋6

𝑋𝑋1

𝑋𝑋2 𝑋𝑋3

𝑋𝑋4

𝑋𝑋5

𝜇𝜇1(0)𝜇𝜇1(1)

𝑄𝑄 𝑄𝑄 , 𝑄𝑄

Supervised Learning

GenerativeModels

ReinforcementLearning

𝜇𝜇𝑖𝑖(𝑇𝑇) 𝜇𝜇𝑖𝑖

(𝑇𝑇) 𝜇𝜇𝑘𝑘𝑇𝑇

𝑘𝑘∈𝑆𝑆𝜇𝜇𝑗𝑗(𝑇𝑇)

𝜇𝜇2(1)

𝜇𝜇6(1)

𝜇𝜇3(1)

𝜇𝜇5(1)

𝜇𝜇4(1)

[Dai, et al. ICML 16]3

Logical expressiveness of GNN• Logical classifiers

• classifies each node according to whether it satisfies a formula• the formula is expressed in first order predicate logic (FO)

graded modal logic ⊂ FOC2 ⊂ FO

WL-test=

GNN(∞ depth)=

(finite depth)

Readout(global information)

[Barcelo, et al. ICLR’20]

• 2 variable• With Counting quantifier, ∃≥𝑁𝑁

𝑓𝑓 𝑥𝑥 ≔ Red 𝑥𝑥 ∧ ∃≥3𝑦𝑦 [¬Edge 𝑥𝑥,𝑦𝑦 ∧ 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 𝑦𝑦 ]

graded modal logic ⊂ FOC2

• Guarded by Edge 𝑥𝑥,𝑦𝑦 .• Allow ∃≥𝑁𝑁𝑦𝑦 Edge 𝑥𝑥,𝑦𝑦 ∧ Predicate 𝑦𝑦 , • but do not allow ∃≥𝑁𝑁𝑦𝑦 Predicate 𝑦𝑦𝑓𝑓 𝑥𝑥 ≔ Red 𝑥𝑥 ∧ ∃≥3𝑦𝑦 [Edge 𝑥𝑥,𝑦𝑦 ∧ 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 𝑦𝑦 ]

More compact representation and lower error

Harvard clean energy dataset, 2.3 million organic molecules, predict power conversion efficiency (0 -12 %)

0.1M 1M 10M 100M 1000M

0.0850.0950.1200.150

Dimension

Embedded MF

Embedded BP

Weisfeiler-LehmanLevel 6

HashedWL Level 6

Structure2Vecreduces model

size by 10,000x !

Le Song 5[Dai, et al. ICML’16]

Fraudulent account detection

Alipay: online payment platform

new accounts in a month: millions of nodes and edges.

Fake account can increase system level risk

Account – Device Network

device

[Liu, et al. CCS’17, CIKM’17, AAAI’19]Recall

Precision

detection comparisonrule structure2vecnode2vec

Knowledge base reasoning• Many domains need knowledge bases …

• Contain incorrect, incomplete, duplicated records• Need link prediction, attribution classification,

record de-duplication.

• Many existing rules/logic

• Graph neural networks• not explicitly use existing knowledge

UW-CSE dataset details• 22 relations

• Teach, publish …

• Task goal• Predict who is whose advisor• Zero observed facts for query predicates

• 94 crowd-sourced FOL formulasadvisedBy(s, p) ⇒ professor(p)

advisedBy(s, p) ⇒¬yearsInProgram(s, Year_1)

professor(x) ⇒¬student(y)

publication(p, x) v publication(p,y) v student(x) v ¬student(y) ⇒ professor(y)

student(x) v ¬advisedBy(x,y) ⇒ tempAdvisedBy(x,y)… S1 S3

P2 Paper1publish

Course1

𝓗𝓗𝓞𝓞

Bipartite graph representation for knowledge base• Entity, 𝓒𝓒 = 𝐴𝐴,𝐵𝐵,𝐶𝐶,𝐷𝐷…• Predicate (attribute | relation), 𝑟𝑟 ⋅ :𝓒𝓒 × 𝓒𝓒 × ⋯ ↦ 0,1

• Eg. Smoke(x), Friend(x,x’), Like(x,x’)

F(A,B)=1

F(A,C)=1

S(B)=1

S(A)=1

F(A,D)=1

A B C D

F(B,C) F(B,D) F(C,D) S(C) S(D)

𝓖𝓖𝓚𝓚

predicatebinary variable latent variable

𝓞𝓞 𝓗𝓗

Markov logic networks• Use logic formula 𝑓𝑓 ⋅ :𝓒𝓒 × 𝓒𝓒 × ⋯× 𝓒𝓒 ↦ 0,1 for potential functions

• Eg. formula 𝑓𝑓(x,x’): Friend(x,x’) ∧ Smoke(x) ⇒Smoke(x’)

F(A,B)=1

F(A,C)=1

S(B)=1

S(A)=1

F(A,D)=1 F(B,C) F(B,D) F(C,D) S(C) S(D)

𝑓𝑓(A,B) 𝑓𝑓(A,C) 𝑓𝑓(A,D) 𝑓𝑓(B,C) 𝑓𝑓(B,D) 𝑓𝑓(C,D)

𝑀𝑀𝑀𝑀𝑀𝑀

𝑃𝑃 𝓞𝓞,𝓗𝓗 =1𝑍𝑍

exp �𝑓𝑓

𝑤𝑤𝑓𝑓�𝑎𝑎𝑓𝑓

𝜙𝜙𝑓𝑓(𝑎𝑎𝑓𝑓)

𝑤𝑤𝑓𝑓: formula weight, eg. 𝜙𝜙𝑓𝑓 𝑥𝑥, 𝑥𝑥𝑥 : ¬Friend (x,x’)∨¬Smoke (x) ∨ Smoke (x’)10

𝓞𝓞 𝓗𝓗

Challenges in inference• Large grounded network, 𝑂𝑂 𝑛𝑛2 in the number 𝑛𝑛 of entities!

• Enumerate configuration over 𝑂𝑂(𝑛𝑛2) binary variables, with 𝑂𝑂 2𝑛𝑛2 possibilities.

F(A,B)=1

F(A,C)=1

S(B)=1

S(A)=1

𝑃𝑃 F B, C 𝓞𝓞 𝑃𝑃 S C 𝓞𝓞

𝑍𝑍 = �𝑎𝑎𝑎𝑎𝑎𝑎 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑖𝑖𝑝𝑝𝑎𝑎𝑝𝑝 𝑐𝑐𝑝𝑝𝑐𝑐𝑝𝑝𝑖𝑖𝑛𝑛𝑎𝑎𝑡𝑡𝑖𝑖𝑝𝑝𝑛𝑛 𝑝𝑝𝑓𝑓 𝑣𝑣𝑎𝑎𝑣𝑣𝑖𝑖𝑎𝑎𝑝𝑝𝑎𝑎𝑝𝑝 𝑣𝑣𝑎𝑎𝑎𝑎𝑣𝑣𝑝𝑝𝑝𝑝

exp �𝑓𝑓

𝜙𝜙𝑓𝑓(𝑎𝑎𝑓𝑓)

Variational EM

PosteriorLikelihood

GCNMLN

KnowledgeGraph

𝜙𝜙𝑓𝑓(⋅)

formula potential predicate posterior

Variational EM

𝑃𝑃 𝓞𝓞,𝓗𝓗 =1𝑍𝑍 exp �

𝑓𝑓

𝜙𝜙𝑓𝑓(𝑎𝑎𝑓𝑓) 𝑄𝑄 𝓗𝓗 𝓞𝓞

[Zhang ICLR’20]

ExpressGNN: use GNN in variational inference• GNN on original KB (𝓖𝓖𝓚𝓚) to get embedding 𝜇𝜇𝐴𝐴, 𝜇𝜇𝐵𝐵,𝜇𝜇𝐶𝐶 , 𝜇𝜇𝐷𝐷 for entities

𝜇𝜇𝐴𝐴, 𝜇𝜇𝐵𝐵,𝜇𝜇𝐶𝐶 , 𝜇𝜇𝐷𝐷 = 𝐺𝐺𝑀𝑀𝑀𝑀(𝓖𝓖𝓚𝓚;𝜃𝜃)

𝓞𝓞

F(A,B)=1

F(A,C)=1

S(B)=1

S(A)=1

F(A,D)=1

A B C D

𝜇𝜇𝐴𝐴 𝜇𝜇𝐵𝐵 𝜇𝜇𝐶𝐶 𝜇𝜇𝐷𝐷

𝓖𝓖𝓚𝓚

13[Zhang ICLR’20]

graph neural networks

ExpressGNN: use GNN in variational inference• GNN on original KB (𝓖𝓖𝓚𝓚) to get embedding 𝜇𝜇𝐴𝐴, 𝜇𝜇𝐵𝐵,𝜇𝜇𝐶𝐶 , 𝜇𝜇𝐷𝐷 for entities

𝜇𝜇𝐴𝐴, 𝜇𝜇𝐵𝐵,𝜇𝜇𝐶𝐶 , 𝜇𝜇𝐷𝐷 = 𝐺𝐺𝑀𝑀𝑀𝑀(𝓖𝓖𝓚𝓚;𝜃𝜃)

• 𝑄𝑄 F B, C = 1 𝓞𝓞 : = 11+exp 𝜇𝜇𝐵𝐵

⊤Θ𝐹𝐹 𝜇𝜇𝐶𝐶+𝜔𝜔𝐵𝐵⊤𝜔𝜔𝐹𝐹

, 𝑄𝑄 S C = 1 𝓞𝓞 : = 11+exp 𝜃𝜃𝑆𝑆

⊤ 𝜇𝜇𝐶𝐶,𝜔𝜔𝐶𝐶

𝓞𝓞

F(A,B)=1

F(A,C)=1

S(B)=1

S(A)=1

F(A,D)=1

𝓗𝓗F(B,C) F(B,D) F(C,D) S(C) S(D)

A B C D

𝜇𝜇𝐴𝐴 𝜇𝜇𝐵𝐵 𝜇𝜇𝐶𝐶 𝜇𝜇𝐷𝐷

𝓖𝓖𝓚𝓚′

14[Zhang ICLR’20]

graph neural networks

F(A,B)=1

F(A,C)=1

S(B)=1

S(A)=1

F(A,D)=1

A B C D

F(B,C) F(B,D) F(C,D) S(C) S(D)

F(A,B)=1

F(A,C)=1

S(B)=1

S(A)=1

Is GNN embedding expressive enough?

if and only if

𝐹𝐹 𝑥𝑥, 𝑦𝑦𝓖𝓖𝓚𝓚′

𝐹𝐹 𝑥𝑥′,𝑦𝑦𝑥

𝑥𝑥𝓖𝓖𝓚𝓚 𝑥𝑥𝑥 𝑦𝑦

𝓖𝓖𝓚𝓚 𝑦𝑦𝑥

not true

𝐹𝐹 𝑥𝑥,𝑦𝑦𝑀𝑀𝑀𝑀𝑁𝑁

𝐹𝐹 𝑥𝑥′,𝑦𝑦𝑥

𝓖𝓖𝓚𝓚′

15[Zhang ICLR’20]

𝑄𝑄 F B, C = 1 𝓞𝓞: = 1

1+exp 𝜇𝜇𝐵𝐵⊤Θ𝐹𝐹 𝜇𝜇𝐶𝐶+𝜔𝜔𝐵𝐵

⊤𝜔𝜔𝐹𝐹

Variational EM

PosteriorLikelihood

GCNMLN

KnowledgeGraph

𝜙𝜙𝑓𝑓(⋅)

formula potential predicate posterior

Variational EM

𝑃𝑃 𝓞𝓞,𝓗𝓗 =1𝑍𝑍 exp �

𝑓𝑓

𝜙𝜙𝑓𝑓(𝑎𝑎𝑓𝑓) 𝑄𝑄 𝓗𝓗 𝓞𝓞

[Zhang et al. ICLR’20]

Cora dataset details (CS paper citations)• 10 relation types

• Author, Title, Venue, HasWordTitle, …

• Task goal (entity resolution)• De-duplicate citations, authors, and venues• Zero observed facts for query predicates

• 46 crowd-sourced FOL formulas

W1 Paper1HasWordTitle

SameAuthor?

Author(bc1,a1) v Author(bc2,a2) v SameAuthor(a1,a2) ⇒ SameBib(bc1,bc2)

HasWordAuthor(a1, w) v HasWordAuthor(a2, w) ⇒ SameAuthor(a1, a2)

Title(bc1,t1) v Title(bc2,t2) v SameBib(bc1,bc2) ⇒ SameTitle(t1,t2)

SameVenue(v1,v2) v SameVenue(v2,v3) ⇒ SameVenue(v1,v3)

Title(bc1, t1) v Title(bc2, t2) v HasWordTitle(t1, +w) v HasWordTitle(t2, +w) ⇒ SameBib(bc1, bc2)

Inference accuracy and time• Area under precision-recall curve (AUC-PR)

• Inference wall clock time

ExpressGNN

Large Facebook15K-237 dataset• # entity: 15K, #ground predicate: 50M, #ground formula: 679B

• Increase training data for target predicates

Learning with small data• Combining learning and logic reasoning

• Learning: generalize with noisy input• Logic/symbolic reasoning: predict without data

Data requirement

Noise Tolerance Deep

learning

Symbolic

Desired

Symbolic reasoning: HorseShape(c) ∧ StripePattern(c) ⇒ Zebra(c)

Convolutional NNLife is greatRecurrent NN

Learning based approaches

efficient probabilistic logic reasoning with graph neural

Documents

neural machine reasoning

reasoning in non-probabilistic uncertainty: logic

jylee probabilistic reasoning with bayesian networks

a neural probabilistic language model

probabilistic reasoning - department of computer science...

probabilistic reasoning with graphical security models

l7 probabilistic reasoning

probabilistic reasoning for medical decision support

reasoning with uncertainty; probabilistic reasoning...

02-01 probabilistic reasoning - github pages...probabilistic...

probabilistic neural network

compositional reasoning for probabilistic finite-state...

pairwise neural network classifiers with probabilistic...

artificial intelligence probabilistic reasoning

probabilistic reasoning over time

structured probabilistic reasoning - radboud universiteit

chapter # unsupervised probabilistic neural …

probabilistic reasoning - university of washington

probabilistic reasoning over time

abstract spatial-temporal reasoning via probabilistic