graph learning part2 - data science course ift6758 · 2020. 8. 10. · setup • assume we have a...

48
Graph Learning IFT6758 - Data Science Sources: http://snap.stanford.edu/proj/embeddings-www/ https://jian-tang.com/files/AAAI19/aaai-grltutorial-part2-gnns.pdf

Upload: others

Post on 26-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Graph LearningIFT6758 - Data Science

Sources: http://snap.stanford.edu/proj/embeddings-www/

https://jian-tang.com/files/AAAI19/aaai-grltutorial-part2-gnns.pdf

Page 2: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

!2

The Basics: Graph Neural Networks

Based on material from: • Hamilton et al. 2017. Representation Learning on Graphs: Methods and Applications. IEEE Data Engineering Bulletin on Graph Systems. • Scarselli et al. 2005. The Graph Neural Network Model. IEEE Transactions on Neural Networks.

Page 3: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Graph Neural Networks

!3

Page 4: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Setup

• Assume we have a graph G:

▪ V is the vertex set.

▪ A is the adjacency matrix (assume binary).

▪ X is a matrix of node features.

▪ Categorical attributes, text, image data

– E.g., profile information in a social network.

▪ Node degrees, clustering coefficients, etc.

▪ Indicator vectors (i.e., one-hot encoding of each node)

!4

Page 5: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Neighborhood Aggregation

• Key idea: Generate node embeddings based on local neighborhoods.

!5

Page 6: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

• Intuition: Nodes aggregate information from their neighbors using neural networks

!6

Neighborhood Aggregation

Page 7: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

• Intuition: Network neighborhood defines a computation graph

!7

Neighborhood Aggregation

Every node defines a unique computation graph!

Page 8: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

• Nodes have embeddings at each layer.

• Model can be arbitrary depth.

• “layer-0” embedding of node u is its input feature, i.e. xu.

!8

Neighborhood Aggregation

Layer-2

Layer-1Layer-0

Page 9: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Neighborhood “Convolutions”

• Neighborhood aggregation can be viewed as a center-surround filter.

• Mathematically related to spectral graph convolutions (see Bronstein et al., 2017)

!9

Page 10: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Convolution on Images

Convolution is a “aggregator operators”. Broadly speaking, the goal of an aggregator operator is to summarize data to a reduced form.

!10

Page 11: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Neighborhood Aggregation

• Key distinctions are in how different approaches aggregate information across the layers.

!11

Page 12: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Convolution on Graphs

• The most popular choices of convolution on graphs are averaging \ and summation of all neighbors, i.e. sum or mean pooling, followed by projection by a trainable vector W.

!12

Page 13: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

• Basic approach: Average neighbor information and apply a neural network.

!13

Neighborhood Aggregation

Page 14: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

• Basic approach: Average neighbor messages and apply a neural network.

!14

The Math

Page 15: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

• Basic approach: Average neighbor messages and apply a neural network.

!15

The Math

Page 16: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

• Basic approach: Average neighbor messages and apply a neural network.

!16

The Math

Page 17: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

!17

▪ After K-layers of neighborhood aggregation, we get output embeddings for each node.

Training the Model

Page 18: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Training the Model

• How do we train the model to generate “high-quality” embeddings?

!18

Page 19: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

!19

▪ After K-layers of neighborhood aggregation, we get output embeddings for each node.

▪ We can feed these embeddings into any loss function and run stochastic gradient descent to train the aggregation parameters.

Training the Model

Page 20: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

!20

▪ Train in an unsupervised manner using only the graph structure.

▪ Unsupervised loss function can be anything e.g., based on

▪ Random walks (node2vec, DeepWalk)

▪Graph factorization

▪ i.e., train the model so that “similar” nodes have similar embeddings.

Training the Model

Page 21: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

!21

Training the Model

▪ Alternative: Directly train the model for a supervised task (e.g., node classification):

Page 22: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

!22

Training the Model▪ Alternative: Directly train the model for a supervised task (e.g., node

classification):

Page 23: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Overview of Model

!23

Page 24: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Overview of Model

!24

Page 25: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Overview of Model

!25

Page 26: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Inductive Capability

▪ The same aggregation parameters are shared for all nodes.

▪ The number of model parameters is sublinear in |V| and we can

generalize to unseen nodes!

!26

Page 27: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

e.g., train on protein interaction graph from model organism A and generate embeddings on newly collected data about organism B

!27

Inductive Capability

Page 28: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

!28

Inductive Capability

Many application settings constantly encounter previously unseen nodes.e.g., Reddit, YouTube, GoogleScholar, ….

Need to generate new embeddings “on the fly”

Page 29: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Recap

▪ Recap: Generate node embeddings by aggregating

neighborhood information.

▪Allows for parameter sharing in the encoder.

▪Allows for inductive learning.

!29

Page 30: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

!30

Page 31: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

!31

Graph Convolutional Networks

Based on material from: • Kipf et al., 2017. Semisupervised Classification with Graph Convolutional Networks. ICLR.

Page 32: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Recap: Convolutional Neural Networks (CNNs)

!32

Page 33: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Recap: Convolutional Neural Networks (CNNs)

!33

Page 34: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Graph Neural Networks (GNNs)

!34

Page 35: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

!35

Graph Neural Networks (GNNs)

Page 36: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

!36

Graph Neural Networks (GNNs)

Page 37: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

!37

Graph Neural Networks (GNNs)

Page 38: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Graph Convolutional Networks (GCNs)

!38

▪ Kipf et al.’s Graph Convolutional Networks (GCNs) are a slight variation on the neighborhood aggregation idea:

Page 39: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Graph Convolutional Networks

!39

Page 40: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

!40

Graph Convolutional Networks

▪ Empirically, they found this configuration to give the best results. ▪ More parameter sharing. ▪ Down-weights high degree neighbors.

Page 41: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

!41

GNN/GCN applications

Page 42: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

!42

Classification and Link Prediction with GNNs / GCNs

Page 43: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

!43

Classification and Link Prediction with GNNs / GCNs

Page 44: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Semi-supervised Classification on Graphs

!44

Page 45: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Semi-supervised Classification on Graphs

!45

Page 46: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Universality of Graph Representations

!46

Page 47: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Conclusion

!47

Page 48: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix

Conferences focusing on Graphs

• WWW: The Web Conferencehttps://www2020.thewebconf.org/

• ASONAM: The IEEE/ACM International Conference on Advances in Social Networks Analysis and Mininghttp://asonam.cpsc.ucalgary.ca/2019/

• ICML: International Conference on Machine Learninghttps://icml.cc/

• KDD: ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MININGhttps://www.kdd.org/kdd2020/

• ICDM: International Conference on Data Mininghttps://waset.org/data-mining-conference-in-july-2020-in-istanbul

!48