two papers on gnn theory: how power are gnn and deeper ... · graph classi cation via gnn for graph...

Post on 03-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Two papers on GNN theory: How Power are GNN andDeeper Insight of GCN Semisupervised

Presenter: Ji Gaohttps://qdata.github.io/deep2Read

June 3, 2019

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 1 / 30

Outline

1 Review: Graph Neural Networks

2 How powerful are graph neural networks?Graph Neural NetworkGraph Isomorphism and Weisfeiler-Lehman TestGraph Isomorphic Network(GIN)Experiment result

3 Deeper Insights into Graph Convolutional Networks for semi-supervisedlearning

Issues of GCNAlgorithmExperiment result

4 Reference

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 2 / 30

Graph Neural Network

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 3 / 30

How powerful are graph neural networks?

How powerful are graph neural networks? Keyulu Xu, Weihua Hu,Jure Leskovec, Stefanie Jegelka, ICLR 2019 [XHLJ18]

GNN is weaker than Weisfeiler-Lehman graph isomorphism test

GNN cannot learn to distinguish certain types

Propose an architecture that is provably as powerful as WL test.

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 4 / 30

GNN definition

GNN layerThe k-th layer of a GNN is:

a(k)v = AGGREGATE(k)

({h

(k−1)u : u ∈ N (v)

})h

(k)v = COMBINE(k)

(h

(k−1)v , a

(k)v

),

GraphSage - Maxpooling variant[HYL17]

a(k)v = MAX

({ReLU

(W · h(k−1)

u

), ∀u ∈ N (v)

})h

(k)v = W ·

[h

(k−1)v , a

(k)v

]

GCN[KW16]

h(k)v = ReLU

(W ·MEAN

{h

(k−1)u , ∀u ∈ N (v) ∪ {v}

})

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 5 / 30

Graph classification via GNN

For graph classification, an additional step for GNN is to combine theembedding of all the nodes together:

hG = READOUT({

h(K)v

∣∣ v ∈ G}).

READOUT can be a simple permutation invariant function such assummation or a more sophisticated graph-level pooling function.

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 6 / 30

Graph Isomorphism

Graph Isomorphism: Graph G1 = (V1,E1) and G2 = (V2,E2) areisomorphic graphs if a permutation p of V1 makes G1 equals G2:Any (u, v) ∈ E1 has (p(u), p(v)) ∈ E2

Verify graph isomorphism is NP, somewhere between P andNP-Complete.

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 7 / 30

Graph Isomorphism Example

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 8 / 30

Graph Isomorphism Example

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 9 / 30

Weisfeiler-Lehman Test

Weisfeiler-Lehman Test is a subtree-based method to solvegraph-isomorphic problem.

Weisfeiler-Lehman Test

Initialize all nodes with some node features, or with same label 1.

Iteratively:1 Aggregates the labels of nodes and their neighborhoods.2 Hashes the aggregated labels into unique new labels.

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 10 / 30

Weisfeiler-Lehman Test

From https://www.slideshare.net/pratikshukla11/graph-kernelpdf:

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 11 / 30

WL Test and GNN

In the GraphSage paper, the authors claim that WL Test is onespecial case of GraphSage, only the hashing algorithm is replaced withtrainable aggregation function.

However, as shown in this paper, common aggregators such asmean-pooling/max-pooling is weaker than the hash function.

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 12 / 30

WL Test is stronger than GNN

Lemma: Let G1 and G2 be any two non-isomorphic graphs. If a graphneural network A : G → RD maps G1 and G2 to differentembeddings, the Weisfeiler-Lehman graph isomorphism test alsodecides G1 and G2 are not isomorphic.

(Assume the hashing function is extremely powerful.)

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 13 / 30

How to make GNN powerful?

TheoremLet A : G → Rd be a GNN. With a sufficient number of GNN layers, A maps any graphs G1 andG2 that the Weisfeiler-Lehman test of isomorphism decides as non-isomorphic, to differentembeddings if the following conditions hold:

a) A aggregates and updates node features iteratively with

h(k)v = φ

(h

(k−1)v , f

({h

(k−1)u : u ∈ N (v)

})),

where the functions f , which operates on multisets, and φ are injective (1-1 function).

b) A’s graph-level readout, which operates on the multiset of node features{h

(k)v

}, is

injective.

Conclusion: The problem is in the aggregation function (and thereadout function).

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 14 / 30

How to make GNN powerful?

Idea is simple and straightforward, however,sum/max-pooling/mean-pooling are not injective on multiset.

The authors model the aggregator with a neural network:

Graph Isomorphism Network(GIN)

A GIN layer is defined as:

h(k)v = MLP(k)

((1 + ε(k)

)· h(k−1)

v +∑

u∈N (v)h

(k−1)u

)

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 15 / 30

GIN Guarantees

TheoremAssume X is countable. There exists a function f : X → Rn so that h(X ) =

∑x∈X f (x) is

unique for each multiset X ⊂ X of bounded size. Moreover, any multiset function g can bedecomposed as g (X ) = φ

(∑x∈X f (x)

)for some function φ.

LemmaAssume X is countable. There exists a function f : X → Rn so that for infinitely many choicesof ε, including all irrational numbers, h(c,X ) = (1 + ε) · f (c) +

∑x∈X f (x) is unique for each

pair (c,X ), where c ∈ X and X ⊂ X is a multiset of bounded size. Moreover, any function gover such pairs can be decomposed as g (c,X ) = ϕ

((1 + ε) · f (c) +

∑x∈X f (x)

)for some

function ϕ.

By these two conditions, GIN is as generalize as WL test with a properMLP φ learned and a proper ε choose.

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 16 / 30

Graph classification Readout of GIN

GIN use a readout that loads the input on every layers:

hG = CONCAT(READOUT

({h

(k)v |v ∈ G

}) ∣∣ k = 0, 1, . . . ,K).

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 17 / 30

Study on the variants

1-layer perceptions are not sufficient.

Mean and Max-pooling are not good enough (With counter-examplesabove).

Mean pooling is better for tasks with diverse and rarely repeatfeatures, and max pooling is better for catching a general relationship.

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 18 / 30

Experiment

Use 9 datasets

Methods in comparison:

GIN-0: ε is always 0GIN-ε: ε is also updated by gradient in the trainingSVMDCNNDGCNNAWLGraphSage-MeanGCN

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 19 / 30

Experiment result

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 20 / 30

Deeper Insights into Graph Convolutional Networks forsemi-supervised learning

Deeper Insights into Graph Convolutional Networksforsemi-supervised learning Qimai Li, Zhichao Han, Xiaoming Wu, AAAI2018 [LHW18]

GCN performance doesn’t increase with number of layers, whichcontradicts its design logic

To overcome this problem(Stay with shallow architecture), useco-training and self-training to improve its performance.

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 21 / 30

GCN formula

GCN

A GCN layer is defined as:

H(k+1) = σ(D−12 AD−

12H(k)W (k))

GCN adds a self-loop to every nodes in the graph.

The hidden vector of a node is updated by a weighted average ofitself and its neighbors.

Laplacian smoothing

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 22 / 30

When GCN fails

Generate a 2 dimension vector on every vertex on a small dataset

Repeatedly apply Laplacian smoothing to the graph will cause it mixtogether again

Theorem: The result of Laplacian smoothing will converge to D−121θ

if the graph has no bipartite components.

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 23 / 30

Issues of GCN

GCN used in [KW16] has only 2 layers. However, GCN suppose to bea localize filter, which means it doesn’t propagate the information tothe whole graph.

Example: If you don’t have a near neighbor with label, GCN won’twork.

Authors also claim that ”GCN relies on additional large validation setto select the model, as hyperparameter is crucial.”

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 24 / 30

Co-train a GCN with Random Walk Model

Idea: Random Walk can explore global graph structure, which iscomplementary to the GCN result.

In particular, use random walk model to provide extra labels for GCNto make prediction.

Calculate the random walk absorbing probabilities matrix

Algorithm:

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 25 / 30

GCN self-training

Train it again

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 26 / 30

Experiment

Data: Cora, Citeseer and Pubmed

Methods in comparison:Label propagation via random walk(LP)ChebyNetGCN: With or without validation set.Co-trainingSelf-trainingCo-training set union/intersect Self-training set

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 27 / 30

Experiment Result

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 28 / 30

Experiment result II

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 29 / 30

Reference I

Will Hamilton, Zhitao Ying, and Jure Leskovec, Inductive representation learning on largegraphs, Advances in Neural Information Processing Systems, 2017, pp. 1024–1034.

Thomas N Kipf and Max Welling, Semi-supervised classification with graph convolutionalnetworks, arXiv preprint arXiv:1609.02907 (2016).

Qimai Li, Zhichao Han, and Xiao-Ming Wu, Deeper insights into graph convolutionalnetworks for semi-supervised learning, Thirty-Second AAAI Conference on ArtificialIntelligence, 2018.

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka, How powerful are graphneural networks?, arXiv preprint arXiv:1810.00826 (2018).

Presenter: Ji Gao https://qdata.github.io/deep2ReadTwo papers on GNN theory: How Power are GNN and Deeper Insight of GCN SemisupervisedJune 3, 2019 30 / 30

top related