graph learning part2 - data science course ift6758 · 2020. 8. 10. · setup • assume we have a...
TRANSCRIPT
![Page 1: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/1.jpg)
Graph LearningIFT6758 - Data Science
Sources: http://snap.stanford.edu/proj/embeddings-www/
https://jian-tang.com/files/AAAI19/aaai-grltutorial-part2-gnns.pdf
![Page 2: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/2.jpg)
!2
The Basics: Graph Neural Networks
Based on material from: • Hamilton et al. 2017. Representation Learning on Graphs: Methods and Applications. IEEE Data Engineering Bulletin on Graph Systems. • Scarselli et al. 2005. The Graph Neural Network Model. IEEE Transactions on Neural Networks.
![Page 3: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/3.jpg)
Graph Neural Networks
!3
![Page 4: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/4.jpg)
Setup
• Assume we have a graph G:
▪ V is the vertex set.
▪ A is the adjacency matrix (assume binary).
▪ X is a matrix of node features.
▪ Categorical attributes, text, image data
– E.g., profile information in a social network.
▪ Node degrees, clustering coefficients, etc.
▪ Indicator vectors (i.e., one-hot encoding of each node)
!4
![Page 5: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/5.jpg)
Neighborhood Aggregation
• Key idea: Generate node embeddings based on local neighborhoods.
!5
![Page 6: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/6.jpg)
• Intuition: Nodes aggregate information from their neighbors using neural networks
!6
Neighborhood Aggregation
![Page 7: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/7.jpg)
• Intuition: Network neighborhood defines a computation graph
!7
Neighborhood Aggregation
Every node defines a unique computation graph!
![Page 8: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/8.jpg)
• Nodes have embeddings at each layer.
• Model can be arbitrary depth.
• “layer-0” embedding of node u is its input feature, i.e. xu.
!8
Neighborhood Aggregation
Layer-2
Layer-1Layer-0
![Page 9: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/9.jpg)
Neighborhood “Convolutions”
• Neighborhood aggregation can be viewed as a center-surround filter.
• Mathematically related to spectral graph convolutions (see Bronstein et al., 2017)
!9
![Page 10: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/10.jpg)
Convolution on Images
Convolution is a “aggregator operators”. Broadly speaking, the goal of an aggregator operator is to summarize data to a reduced form.
!10
![Page 11: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/11.jpg)
Neighborhood Aggregation
• Key distinctions are in how different approaches aggregate information across the layers.
!11
![Page 12: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/12.jpg)
Convolution on Graphs
• The most popular choices of convolution on graphs are averaging \ and summation of all neighbors, i.e. sum or mean pooling, followed by projection by a trainable vector W.
!12
![Page 13: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/13.jpg)
• Basic approach: Average neighbor information and apply a neural network.
!13
Neighborhood Aggregation
![Page 14: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/14.jpg)
• Basic approach: Average neighbor messages and apply a neural network.
!14
The Math
![Page 15: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/15.jpg)
• Basic approach: Average neighbor messages and apply a neural network.
!15
The Math
![Page 16: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/16.jpg)
• Basic approach: Average neighbor messages and apply a neural network.
!16
The Math
![Page 17: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/17.jpg)
!17
▪ After K-layers of neighborhood aggregation, we get output embeddings for each node.
Training the Model
![Page 18: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/18.jpg)
Training the Model
• How do we train the model to generate “high-quality” embeddings?
!18
![Page 19: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/19.jpg)
!19
▪ After K-layers of neighborhood aggregation, we get output embeddings for each node.
▪ We can feed these embeddings into any loss function and run stochastic gradient descent to train the aggregation parameters.
Training the Model
![Page 20: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/20.jpg)
!20
▪ Train in an unsupervised manner using only the graph structure.
▪ Unsupervised loss function can be anything e.g., based on
▪ Random walks (node2vec, DeepWalk)
▪Graph factorization
▪ i.e., train the model so that “similar” nodes have similar embeddings.
Training the Model
![Page 21: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/21.jpg)
!21
Training the Model
▪ Alternative: Directly train the model for a supervised task (e.g., node classification):
![Page 22: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/22.jpg)
!22
Training the Model▪ Alternative: Directly train the model for a supervised task (e.g., node
classification):
![Page 23: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/23.jpg)
Overview of Model
!23
![Page 24: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/24.jpg)
Overview of Model
!24
![Page 25: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/25.jpg)
Overview of Model
!25
![Page 26: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/26.jpg)
Inductive Capability
▪ The same aggregation parameters are shared for all nodes.
▪ The number of model parameters is sublinear in |V| and we can
generalize to unseen nodes!
!26
![Page 27: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/27.jpg)
e.g., train on protein interaction graph from model organism A and generate embeddings on newly collected data about organism B
!27
Inductive Capability
![Page 28: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/28.jpg)
!28
Inductive Capability
Many application settings constantly encounter previously unseen nodes.e.g., Reddit, YouTube, GoogleScholar, ….
Need to generate new embeddings “on the fly”
![Page 29: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/29.jpg)
Recap
▪ Recap: Generate node embeddings by aggregating
neighborhood information.
▪Allows for parameter sharing in the encoder.
▪Allows for inductive learning.
!29
![Page 30: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/30.jpg)
!30
![Page 31: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/31.jpg)
!31
Graph Convolutional Networks
Based on material from: • Kipf et al., 2017. Semisupervised Classification with Graph Convolutional Networks. ICLR.
![Page 32: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/32.jpg)
Recap: Convolutional Neural Networks (CNNs)
!32
![Page 33: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/33.jpg)
Recap: Convolutional Neural Networks (CNNs)
!33
![Page 34: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/34.jpg)
Graph Neural Networks (GNNs)
!34
![Page 35: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/35.jpg)
!35
Graph Neural Networks (GNNs)
![Page 36: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/36.jpg)
!36
Graph Neural Networks (GNNs)
![Page 37: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/37.jpg)
!37
Graph Neural Networks (GNNs)
![Page 38: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/38.jpg)
Graph Convolutional Networks (GCNs)
!38
▪ Kipf et al.’s Graph Convolutional Networks (GCNs) are a slight variation on the neighborhood aggregation idea:
![Page 39: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/39.jpg)
Graph Convolutional Networks
!39
![Page 40: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/40.jpg)
!40
Graph Convolutional Networks
▪ Empirically, they found this configuration to give the best results. ▪ More parameter sharing. ▪ Down-weights high degree neighbors.
![Page 41: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/41.jpg)
!41
GNN/GCN applications
![Page 42: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/42.jpg)
!42
Classification and Link Prediction with GNNs / GCNs
![Page 43: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/43.jpg)
!43
Classification and Link Prediction with GNNs / GCNs
![Page 44: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/44.jpg)
Semi-supervised Classification on Graphs
!44
![Page 45: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/45.jpg)
Semi-supervised Classification on Graphs
!45
![Page 46: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/46.jpg)
Universality of Graph Representations
!46
![Page 47: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/47.jpg)
Conclusion
!47
![Page 48: graph learning part2 - Data Science Course IFT6758 · 2020. 8. 10. · Setup • Assume we have a graph G: V is the vertex set. A is the adjacency matrix (assume binary). X is a matrix](https://reader035.vdocuments.us/reader035/viewer/2022071406/60fb3ef3ee555b4f7a5e4adf/html5/thumbnails/48.jpg)
Conferences focusing on Graphs
• WWW: The Web Conferencehttps://www2020.thewebconf.org/
• ASONAM: The IEEE/ACM International Conference on Advances in Social Networks Analysis and Mininghttp://asonam.cpsc.ucalgary.ca/2019/
• ICML: International Conference on Machine Learninghttps://icml.cc/
• KDD: ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MININGhttps://www.kdd.org/kdd2020/
• ICDM: International Conference on Data Mininghttps://waset.org/data-mining-conference-in-july-2020-in-istanbul
!48