when does label propagation fail? a view from a network generative model
TRANSCRIPT
When Does Label Propaga1on Fail? A View from a Network Genera1ve Model
Yuto Yamaguchi and Kohei Hayashi
17/08/22 IJCAI@Melbourne 1
Node Classification
Given Find Partially labeled undirected graph Labels of all nodes
17/08/22 IJCAI@Melbourne 2
Example:�User profile inference
Friends Soccer Soccer
Soccer
Tennis
Baseball
??? What’s his hobby?
Node Classification
17/08/22 IJCAI@Melbourne 3
Label Propagation (1/2) Propagate neighbors’ labels
Friends Soccer Soccer
Soccer
Tennis
Baseline
???
Soccer Soccer
Soccer
Tennis
Baseline
Soccer
[Zhu+, 03], [Zhou+, 03], etc.
17/08/22 IJCAI@Melbourne 4
Label Propagation (2/2)
Q F;X,Y,λ( ) = 12
fi − yi 22
i=1
N
∑ +λ2
xij fi − f j 22
j=1
N
∑i=1
N
∑
Given: adjacency matrix X and labels Y Find: F = { fi } that minimizes Q
17/08/22 IJCAI@Melbourne 5
F ∈ RN x K
Y ∈ {0, 1}N x K
X ∈ {0, 1}N x N
N: # of nodes K: # of labels λ ∈ R+ : user parameter
[Zhu+, 03], [Zhou+, 03], etc.
Cases when LP fails (prac1cally known) Different labels are connected Label ratio is not uniform
Q. So, do we know why LP fails in these cases? A. No. Since it’s not a probabilistic model, we don’t know the assumptions behind the model.
17/08/22 IJCAI@Melbourne 6
Edge probability is not uniform
What we do in this work
1. Prove a theore1cal rela1onship between LP and Stochas(c Block Model, which is a well-‐studied probabilis1c genera1ve model
2. Find the assump(ons behind LP through the assump1ons behind SBM
3. Show when and why LP fails
17/08/22 IJCAI@Melbourne 7
NETWORK GENERATIVE MODELS
17/08/22 IJCAI@Melbourne 8
Stochastic Block Model Generative process Multinomial
Bernoulli
①
②
①: Generate cluster assignment for each node (which can be thought of labels)
②: Generate adjacency matrix
17/08/22 IJCAI@Melbourne 9
γ ∈ RK
Π∈ RKxK
Parameters:
Proposed:�Partially Labeled SBM (PLSBM)
Generative process ①
②
③
②:Generate labels for “labeled nodes” (α large à yi is more likely to be the same as zi)
Depends on parameter α
17/08/22 IJCAI@Melbourne 10
γ ∈ RK
Π∈ RKxK
α ∈ 0,1[ ]
Parameters:
Rela1onships between models
17/08/22 IJCAI@Melbourne 11
SBM PLSBM
LP Discre1zed LP
Main result (next slide)
No labels
Con1nuous relaxa1on
Main Result
Map estimator Z of PLSBM is identical to the solution of (discretized) LP when the following conditions hold
Condition 1: Condition 2: Condition 3: Condition 4: (omitted)
17/08/22 IJCAI@Melbourne 12
Condition 1
Implication (implicit assumption of LP) • Label ratio is uniform
17/08/22 IJCAI@Melbourne 13
Violates this assumption L
Condition 2
Implication (Implicit assumptions of LP) • Edge probs between the same labels are all the same (μ) • Edge probs between different labels are all the same (ν)
17/08/22 IJCAI@Melbourne 14
Violates this assumption L
Condition 3
Implication (Implicit assumption of LP) • Assortative (same labels tend to be connected)
17/08/22 IJCAI@Melbourne 15
Violates this assumption L
Experimental results
17/08/22 IJCAI@Melbourne 16
… Come see full results at the poster session J
Better
Setups: 1. Generate datasets by PLSBM 2. infer labels (Z) by PLSBM, SBM, and LP 3. Report mean accuracy of 20 trials
Assortative Disassortative
Agree with theoretical results
Summary
• Proposed Par1ally-‐Labeled SBM (PLSBM)
• Proved the rela1onship between LP and SBM via PLSBM
• Showed cases when LP fails
• Experimental and Theore1cal results agree
17/08/22 IJCAI@Melbourne 17
Github: yamaguchiyuto/plsbm