efficient probabilistic logic reasoning with graph neural
Post on 17-Jan-2022
15 Views
Preview:
TRANSCRIPT
Efficient Probabilistic Logic Reasoning with Graph Neural Networks
Le Song
Associate Professor, CSE, College of ComputingAssociate Director, Center for Machine Learning
Georgia Institute of Technology
Learning with small data• Combine learning and logic reasoning
• Learning: generalize with noisy input• Logic/symbolic reasoning: predict without data
Data requirement
Noise Tolerance Deep
learning
Symbolic
Desired
Example of logic inference: HorseShape(c) ∧ StripePattern(c) ⇒ Zebra(c)
+
⇒Zebra
Horse
2
Graph neural networks (GNN/Structure2vec)• Obtain embedding via iterative updates:
1. Initialize 𝜇𝜇𝑖𝑖(0) = ℎ 𝑊𝑊1𝑋𝑋𝑖𝑖 ,∀ 𝑖𝑖
2. Iterate 𝑇𝑇 times
𝜇𝜇𝑖𝑖(𝑡𝑡) ← ℎ 𝑊𝑊1𝜇𝜇𝑖𝑖
(𝑡𝑡−1) + 𝑊𝑊2 �𝑗𝑗∈𝒩𝒩 𝑖𝑖
𝜇𝜇𝑗𝑗(𝑡𝑡−1)
neural network
𝑋𝑋6
𝑋𝑋1
𝑋𝑋2 𝑋𝑋3
𝑋𝑋4
𝑋𝑋5
𝜇𝜇1(0)𝜇𝜇1(1)
𝑄𝑄 𝑄𝑄 , 𝑄𝑄
Supervised Learning
GenerativeModels
ReinforcementLearning
𝜇𝜇𝑖𝑖(𝑇𝑇) 𝜇𝜇𝑖𝑖
(𝑇𝑇) 𝜇𝜇𝑘𝑘𝑇𝑇
𝑘𝑘∈𝑆𝑆𝜇𝜇𝑗𝑗(𝑇𝑇)
𝜇𝜇2(1)
𝜇𝜇6(1)
𝜇𝜇3(1)
𝜇𝜇5(1)
𝜇𝜇4(1)
[Dai, et al. ICML 16]3
Logical expressiveness of GNN• Logical classifiers
• classifies each node according to whether it satisfies a formula• the formula is expressed in first order predicate logic (FO)
4
graded modal logic ⊂ FOC2 ⊂ FO
WL-test=
GNN
=
GNN(∞ depth)=
(finite depth)
≤
R-GNN
Readout(global information)
[Barcelo, et al. ICLR’20]
FOC2
• 2 variable• With Counting quantifier, ∃≥𝑁𝑁
𝑓𝑓 𝑥𝑥 ≔ Red 𝑥𝑥 ∧ ∃≥3𝑦𝑦 [¬Edge 𝑥𝑥,𝑦𝑦 ∧ 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 𝑦𝑦 ]
graded modal logic ⊂ FOC2
• Guarded by Edge 𝑥𝑥,𝑦𝑦 .• Allow ∃≥𝑁𝑁𝑦𝑦 Edge 𝑥𝑥,𝑦𝑦 ∧ Predicate 𝑦𝑦 , • but do not allow ∃≥𝑁𝑁𝑦𝑦 Predicate 𝑦𝑦𝑓𝑓 𝑥𝑥 ≔ Red 𝑥𝑥 ∧ ∃≥3𝑦𝑦 [Edge 𝑥𝑥,𝑦𝑦 ∧ 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 𝑦𝑦 ]
More compact representation and lower error
Harvard clean energy dataset, 2.3 million organic molecules, predict power conversion efficiency (0 -12 %)
0.1M 1M 10M 100M 1000M
0.0850.0950.1200.150
0.280
Dimension
MAE
Embedded MF
Embedded BP
Weisfeiler-LehmanLevel 6
HashedWL Level 6
Structure2Vecreduces model
size by 10,000x !
Le Song 5[Dai, et al. ICML’16]
Fraudulent account detection
Alipay: online payment platform
new accounts in a month: millions of nodes and edges.
Fake account can increase system level risk
Account – Device Network
Bad
Good
device
[Liu, et al. CCS’17, CIKM’17, AAAI’19]Recall
Precision
detection comparisonrule structure2vecnode2vec
6
Knowledge base reasoning• Many domains need knowledge bases …
• Contain incorrect, incomplete, duplicated records• Need link prediction, attribution classification,
record de-duplication.
• Many existing rules/logic
• Graph neural networks• not explicitly use existing knowledge
7
UW-CSE dataset details• 22 relations
• Teach, publish …
• Task goal• Predict who is whose advisor• Zero observed facts for query predicates
• 94 crowd-sourced FOL formulasadvisedBy(s, p) ⇒ professor(p)
advisedBy(s, p) ⇒¬yearsInProgram(s, Year_1)
professor(x) ⇒¬student(y)
publication(p, x) v publication(p,y) v student(x) v ¬student(y) ⇒ professor(y)
student(x) v ¬advisedBy(x,y) ⇒ tempAdvisedBy(x,y)… S1 S3
P2 Paper1publish
Course1
8
𝓗𝓗𝓞𝓞
Bipartite graph representation for knowledge base• Entity, 𝓒𝓒 = 𝐴𝐴,𝐵𝐵,𝐶𝐶,𝐷𝐷…• Predicate (attribute | relation), 𝑟𝑟 ⋅ :𝓒𝓒 × 𝓒𝓒 × ⋯ ↦ 0,1
• Eg. Smoke(x), Friend(x,x’), Like(x,x’)
F(A,B)=1
F(A,C)=1
S(B)=1
S(A)=1
F(A,D)=1
A B C D
F(B,C) F(B,D) F(C,D) S(C) S(D)
𝓖𝓖𝓚𝓚
predicatebinary variable latent variable
9
𝓞𝓞 𝓗𝓗
Markov logic networks• Use logic formula 𝑓𝑓 ⋅ :𝓒𝓒 × 𝓒𝓒 × ⋯× 𝓒𝓒 ↦ 0,1 for potential functions
• Eg. formula 𝑓𝑓(x,x’): Friend(x,x’) ∧ Smoke(x) ⇒Smoke(x’)
F(A,B)=1
F(A,C)=1
S(B)=1
S(A)=1
F(A,D)=1 F(B,C) F(B,D) F(C,D) S(C) S(D)
𝑓𝑓(A,B) 𝑓𝑓(A,C) 𝑓𝑓(A,D) 𝑓𝑓(B,C) 𝑓𝑓(B,D) 𝑓𝑓(C,D)
𝑀𝑀𝑀𝑀𝑀𝑀
𝑃𝑃 𝓞𝓞,𝓗𝓗 =1𝑍𝑍
exp �𝑓𝑓
𝑤𝑤𝑓𝑓�𝑎𝑎𝑓𝑓
𝜙𝜙𝑓𝑓(𝑎𝑎𝑓𝑓)
𝑤𝑤𝑓𝑓: formula weight, eg. 𝜙𝜙𝑓𝑓 𝑥𝑥, 𝑥𝑥𝑥 : ¬Friend (x,x’)∨¬Smoke (x) ∨ Smoke (x’)10
𝓞𝓞 𝓗𝓗
Challenges in inference• Large grounded network, 𝑂𝑂 𝑛𝑛2 in the number 𝑛𝑛 of entities!
• Enumerate configuration over 𝑂𝑂(𝑛𝑛2) binary variables, with 𝑂𝑂 2𝑛𝑛2 possibilities.
F(A,B)=1
F(A,C)=1
S(B)=1
S(A)=1
F(A,D)=1 F(B,C) F(B,D) F(C,D) S(C) S(D)
𝑓𝑓(A,B) 𝑓𝑓(A,C) 𝑓𝑓(A,D) 𝑓𝑓(B,C) 𝑓𝑓(B,D) 𝑓𝑓(C,D)
𝑀𝑀𝑀𝑀𝑀𝑀
𝑃𝑃 F B, C 𝓞𝓞 𝑃𝑃 S C 𝓞𝓞
𝑍𝑍 = �𝑎𝑎𝑎𝑎𝑎𝑎 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑖𝑖𝑝𝑝𝑎𝑎𝑝𝑝 𝑐𝑐𝑝𝑝𝑐𝑐𝑝𝑝𝑖𝑖𝑛𝑛𝑎𝑎𝑡𝑡𝑖𝑖𝑝𝑝𝑛𝑛 𝑝𝑝𝑓𝑓 𝑣𝑣𝑎𝑎𝑣𝑣𝑖𝑖𝑎𝑎𝑝𝑝𝑎𝑎𝑝𝑝 𝑣𝑣𝑎𝑎𝑎𝑎𝑣𝑣𝑝𝑝𝑝𝑝
exp �𝑓𝑓
𝑤𝑤𝑓𝑓�𝑎𝑎𝑓𝑓
𝜙𝜙𝑓𝑓(𝑎𝑎𝑓𝑓)
11
Variational EM
PosteriorLikelihood
GCNMLN
KnowledgeGraph
𝜙𝜙𝑓𝑓(⋅)
formula potential predicate posterior
12
Variational EM
𝑃𝑃 𝓞𝓞,𝓗𝓗 =1𝑍𝑍 exp �
𝑓𝑓
𝑤𝑤𝑓𝑓�𝑎𝑎𝑓𝑓
𝜙𝜙𝑓𝑓(𝑎𝑎𝑓𝑓) 𝑄𝑄 𝓗𝓗 𝓞𝓞
[Zhang ICLR’20]
ExpressGNN: use GNN in variational inference• GNN on original KB (𝓖𝓖𝓚𝓚) to get embedding 𝜇𝜇𝐴𝐴, 𝜇𝜇𝐵𝐵,𝜇𝜇𝐶𝐶 , 𝜇𝜇𝐷𝐷 for entities
𝜇𝜇𝐴𝐴, 𝜇𝜇𝐵𝐵,𝜇𝜇𝐶𝐶 , 𝜇𝜇𝐷𝐷 = 𝐺𝐺𝑀𝑀𝑀𝑀(𝓖𝓖𝓚𝓚;𝜃𝜃)
𝓞𝓞
F(A,B)=1
F(A,C)=1
S(B)=1
S(A)=1
F(A,D)=1
A B C D
𝜇𝜇𝐴𝐴 𝜇𝜇𝐵𝐵 𝜇𝜇𝐶𝐶 𝜇𝜇𝐷𝐷
𝓖𝓖𝓚𝓚
13[Zhang ICLR’20]
𝜇𝜇𝑖𝑖(𝑡𝑡) ← ℎ 𝑊𝑊1𝜇𝜇𝑖𝑖
(𝑡𝑡−1) + 𝑊𝑊2 �𝑗𝑗∈𝒩𝒩 𝑖𝑖
𝜇𝜇𝑗𝑗(𝑡𝑡−1)
graph neural networks
ExpressGNN: use GNN in variational inference• GNN on original KB (𝓖𝓖𝓚𝓚) to get embedding 𝜇𝜇𝐴𝐴, 𝜇𝜇𝐵𝐵,𝜇𝜇𝐶𝐶 , 𝜇𝜇𝐷𝐷 for entities
𝜇𝜇𝐴𝐴, 𝜇𝜇𝐵𝐵,𝜇𝜇𝐶𝐶 , 𝜇𝜇𝐷𝐷 = 𝐺𝐺𝑀𝑀𝑀𝑀(𝓖𝓖𝓚𝓚;𝜃𝜃)
• 𝑄𝑄 F B, C = 1 𝓞𝓞 : = 11+exp 𝜇𝜇𝐵𝐵
⊤Θ𝐹𝐹 𝜇𝜇𝐶𝐶+𝜔𝜔𝐵𝐵⊤𝜔𝜔𝐹𝐹
, 𝑄𝑄 S C = 1 𝓞𝓞 : = 11+exp 𝜃𝜃𝑆𝑆
⊤ 𝜇𝜇𝐶𝐶,𝜔𝜔𝐶𝐶
𝓞𝓞
F(A,B)=1
F(A,C)=1
S(B)=1
S(A)=1
F(A,D)=1
𝓗𝓗F(B,C) F(B,D) F(C,D) S(C) S(D)
A B C D
𝜇𝜇𝐴𝐴 𝜇𝜇𝐵𝐵 𝜇𝜇𝐶𝐶 𝜇𝜇𝐷𝐷
𝓖𝓖𝓚𝓚′
14[Zhang ICLR’20]
𝜇𝜇𝑖𝑖(𝑡𝑡) ← ℎ 𝑊𝑊1𝜇𝜇𝑖𝑖
(𝑡𝑡−1) + 𝑊𝑊2 �𝑗𝑗∈𝒩𝒩 𝑖𝑖
𝜇𝜇𝑗𝑗(𝑡𝑡−1)
graph neural networks
F(A,B)=1
F(A,C)=1
S(B)=1
S(A)=1
F(A,D)=1
A B C D
F(B,C) F(B,D) F(C,D) S(C) S(D)
F(A,B)=1
F(A,C)=1
S(B)=1
S(A)=1
F(A,D)=1 F(B,C) F(B,D) F(C,D) S(C) S(D)
𝑓𝑓(A,B) 𝑓𝑓(A,C) 𝑓𝑓(A,D) 𝑓𝑓(B,C) 𝑓𝑓(B,D) 𝑓𝑓(C,D)
Is GNN embedding expressive enough?
if and only if
and
𝐹𝐹 𝑥𝑥, 𝑦𝑦𝓖𝓖𝓚𝓚′
𝐹𝐹 𝑥𝑥′,𝑦𝑦𝑥
𝑥𝑥𝓖𝓖𝓚𝓚 𝑥𝑥𝑥 𝑦𝑦
𝓖𝓖𝓚𝓚 𝑦𝑦𝑥
not true
𝐹𝐹 𝑥𝑥,𝑦𝑦𝑀𝑀𝑀𝑀𝑁𝑁
𝐹𝐹 𝑥𝑥′,𝑦𝑦𝑥
𝑀𝑀𝑀𝑀𝑀𝑀
𝓖𝓖𝓚𝓚′
15[Zhang ICLR’20]
𝑄𝑄 F B, C = 1 𝓞𝓞: = 1
1+exp 𝜇𝜇𝐵𝐵⊤Θ𝐹𝐹 𝜇𝜇𝐶𝐶+𝜔𝜔𝐵𝐵
⊤𝜔𝜔𝐹𝐹
Variational EM
PosteriorLikelihood
GCNMLN
KnowledgeGraph
𝜙𝜙𝑓𝑓(⋅)
formula potential predicate posterior
16
Variational EM
𝑃𝑃 𝓞𝓞,𝓗𝓗 =1𝑍𝑍 exp �
𝑓𝑓
𝑤𝑤𝑓𝑓�𝑎𝑎𝑓𝑓
𝜙𝜙𝑓𝑓(𝑎𝑎𝑓𝑓) 𝑄𝑄 𝓗𝓗 𝓞𝓞
[Zhang et al. ICLR’20]
Cora dataset details (CS paper citations)• 10 relation types
• Author, Title, Venue, HasWordTitle, …
• Task goal (entity resolution)• De-duplicate citations, authors, and venues• Zero observed facts for query predicates
• 46 crowd-sourced FOL formulas
A1 A2
W1 Paper1HasWordTitle
SameAuthor?
Author(bc1,a1) v Author(bc2,a2) v SameAuthor(a1,a2) ⇒ SameBib(bc1,bc2)
HasWordAuthor(a1, w) v HasWordAuthor(a2, w) ⇒ SameAuthor(a1, a2)
Title(bc1,t1) v Title(bc2,t2) v SameBib(bc1,bc2) ⇒ SameTitle(t1,t2)
SameVenue(v1,v2) v SameVenue(v2,v3) ⇒ SameVenue(v1,v3)
Title(bc1, t1) v Title(bc2, t2) v HasWordTitle(t1, +w) v HasWordTitle(t2, +w) ⇒ SameBib(bc1, bc2)
…17
Inference accuracy and time• Area under precision-recall curve (AUC-PR)
• Inference wall clock time
ExpressGNN
18
Large Facebook15K-237 dataset• # entity: 15K, #ground predicate: 50M, #ground formula: 679B
• Increase training data for target predicates
19
Learning with small data• Combining learning and logic reasoning
• Learning: generalize with noisy input• Logic/symbolic reasoning: predict without data
Data requirement
Noise Tolerance Deep
learning
Symbolic
Desired
Symbolic reasoning: HorseShape(c) ∧ StripePattern(c) ⇒ Zebra(c)
Convolutional NNLife is greatRecurrent NN
Learning based approaches
20
top related