convolutional neural networks on graphs with fast...

23
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering M. Defferrard 1 X. Bresson 2 P. Vandergheynst 1 1 EPFL, Lausanne, Switzerland 2 Nanyang Technological University, Singapore Itay Boneh, Asher Kabakovitch Tel-Aviv University, Deep Learning Seminar 2017

Upload: others

Post on 28-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Convolutional Neural Networks on Graphs with Fast LocalizedSpectral Filtering

M. Defferrard1 X. Bresson2 P. Vandergheynst1

1EPFL, Lausanne, Switzerland

2Nanyang Technological University, Singapore

Itay Boneh, Asher KabakovitchTel-Aviv University, Deep Learning Seminar

2017

Page 2: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Spectral FilteringNon-parametric

From convolution theorem:

x ∗G g = U(UTg � UT x

)= U

(g � UT x

)Or algebrically:

x ∗G g = U

g1 0. . .

0 gn

UT x

I Not localized

I O(n) parameters to train

I O(n2) multiplications (no FFT)

Page 3: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Spectral FilteringNon-parametric

From convolution theorem:

x ∗G g = U(UTg � UT x

)= U

(g � UT x

)Or algebrically:

x ∗G g = U

g1 0. . .

0 gn

UT x

I Not localized

I O(n) parameters to train

I O(n2) multiplications (no FFT)

Page 4: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Spectral FilteringPolynomial Parametrization

g is a continuous 1-D function, parametrized by θ:

x ∗G g = U(gθ(λ)� UT x

)= U

gθ(λ1) 0. . .

0 gθ(λn)

UT x

= Ugθ(Λ)UT x = gθ(L)x

Tφ = T~λ⇒ f (T ) = φf (Λ)φ−1

Page 5: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Spectral FilteringPolynomial Parametrization

g is a continuous 1-D function, parametrized by θ:

x ∗G g = U(gθ(λ)� UT x

)= U

gθ(λ1) 0. . .

0 gθ(λn)

UT x

= Ugθ(Λ)UT x = gθ(L)x

Polynomial parametrization:

gθ(Λ) =K−1∑k=0

θkΛk

I K-localized: 1-hop for every L application

I K parameters to train - independent on the graph’s size

I Still O(n2) multiplications (multiplications with the basis U)

Page 6: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Spectral FilteringPolynomial Parametrization

Impulse response on a 2D Euclidean domain

Impulse response on a graph

Page 7: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Spectral FilteringRecursive Polynomial Parametrization

Chebyshev polynomials: Tk(λ) = 2λTk−1(λ)− Tk−2(λ)

T0(λ) = 1

T1(λ) = λ

Parametrization:

gθ(L) =K−1∑k=0

θkTk(L)

L = 2Lλ−1n − I (orthonormal basis in [−1, 1])

Page 8: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Spectral FilteringRecursive Polynomial Parametrization

Filtering:

y = gθ(L)x =K−1∑k=0

θkTk(L)x =K−1∑k=0

θk xk

Recurrence: xk = Tk(L)x = 2Lxk−1 − xk−2

x0 = x

x1 = Lx

I K-localized: 1-hop for every L application

I K parameters to train - independent on the graph’s size

I O(Kn) multiplications (actually O(K |ε|))

I No EVD of L

Page 9: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Spectral FilteringRecursive Polynomial Parametrization

Filtering:

y = gθ(L)x =K−1∑k=0

θkTk(L)x =K−1∑k=0

θk xk

Recurrence: xk = Tk(L)x = 2Lxk−1 − xk−2

x0 = x

x1 = Lx

I K-localized: 1-hop for every L application

I K parameters to train - independent on the graph’s size

I O(Kn) multiplications (actually O(K |ε|))

I No EVD of L

Page 10: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Graph CNN

Page 11: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Graph CNNLearning Filters

ys,j =

Fin∑i=1

gθi,j (L)xs,i

∂L∂θi ,j

=S∑

s=1

[xs,i ,0, . . . , xs,i ,K−1]T∂L∂ys,j

∂L∂xs,i

=Fout∑j=1

gθi,j (L)∂L∂ys,j

s = 1, . . . ,S - sample indexi = 1, . . . ,Fin - input feature map indexj = 1, . . . ,Fout - output feature map indexθi ,j - Fin × Fout Chebyshev coefficients vectors of order KL - Loss over a mini-batch of S samples

Page 12: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Graph CNNLearning Filters

ys,j =

Fin∑i=1

gθi,j (L)xs,i

∂L∂θi ,j

=S∑

s=1

[xs,i ,0, . . . , xs,i ,K−1]T∂L∂ys,j

∂L∂xs,i

=Fout∑j=1

gθi,j (L)∂L∂ys,j

O (K |ε|FinFoutS)

Easily paralleled

Page 13: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Graph PoolingCoarsening

I Multilevel clustering algorithm

I Reduce the size of the graph by a specified factor (2)

I Do all this efficiently

Graclus multilevel clustering algorithm

I Maximizing local normalized cut

I Greedily pick an unmarked vertex i and match it with an unmatched vertex jwhich maximizes the local normalized cut Wi ,j(1/di + 1/dj).

I Extremely fast.

I Dividing the number of nodes by approximately 2.

I Might generate singletons (non matched) nodes. Solved by using fake nodes.

Page 14: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Graph PoolingCoarsening

I Multilevel clustering algorithm

I Reduce the size of the graph by a specified factor (2)

I Do all this efficiently

Graclus multilevel clustering algorithm

I Maximizing local normalized cut

I Greedily pick an unmarked vertex i and match it with an unmatched vertex jwhich maximizes the local normalized cut Wi ,j(1/di + 1/dj).

I Extremely fast.

I Dividing the number of nodes by approximately 2.

I Might generate singletons (non matched) nodes. Solved by using fake nodes.

Page 15: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Graph PoolingFast Pooling

Example: Pooling by 4

I Graclus generates singletons: n0 = 8→ n1 = 5→ n2 = 3

I By adding fake nodes we get: n2 = 3→ n1 = 6→ n0 = 12

I z = [max(x0, x1),max(x4, x5, x6),max(x8, x9, x10)]

I Balanced binary trees ⇒ efficient on GPUs.

Page 16: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

ExperimentsMNIST

Wi ,j = e−‖xi−xj‖22/σ2

I 28×28 pixels + 192 fake nodes ⇒ n = |V| = 976

I 8-NN graph ⇒ |ε| = 3198

I Based on LeNet-5 ⇒ K = 5

Page 17: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

ExperimentsMNIST

Wi ,j = e−‖xi−xj‖22/σ2

I 28×28 pixels + 192 fake nodes ⇒ n = |V| = 976

I 8-NN graph ⇒ |ε| = 3198

I Based on LeNet-5 ⇒ K = 5

Page 18: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

ExperimentsMNIST

Wi ,j = e−‖xi−xj‖22/σ2

I 28×28 pixels + 192 fake nodes ⇒ n = |V| = 976

I 8-NN graph ⇒ |ε| = 3198

I Based on LeNet-5 ⇒ K = 5

I Isotropic filters (no orientation)

I Uninvestigated optimizations and initializations

Page 19: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Experiments20NEWS

I 18,846 text documentsassociated with 20 classes

I 10K most common wordsfrom the 94K unique words⇒ n = |V| = 10K

I 16-NN,Wi ,j = e−‖zi−zj‖

22/σ

2

(zi - word2vec) ⇒|ε| = 132, 834

I x - bag-of-words model

Page 20: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Experiments20NEWS

Comparison with other methods (K = 5)

I Slightly worse than multinomial naiveBayes classifier.

I Defeats fully-connected networks withmuch less parameters.

Total training time divided by # ofgradient steps

I Scales as O(n) as opposed toO(n2)

Page 21: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Experiments20NEWS

Comparison with other methods (K = 5)

I Slightly worse than multinomial naiveBayes classifier.

I Defeats fully-connected networks withmuch less parameters.

Total training time divided by # ofgradient steps

I Scales as O(n) as opposed toO(n2)

Page 22: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

ExperimentsGraph Quality

⇒ The data structure is important

⇒ A well constructed graph is important

⇒ Proper approximations (LSHForest) can be used for larger DBs.

Page 23: Convolutional Neural Networks on Graphs with Fast ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/... · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Conclusions and Future Work

I Introduced a model with linear complexity.

I The quality of the input graph is of paramount importance.

I Local stationarity and compositionality are verified for text documents as long asthe graph is well constructed.