anti-differentiating approximation algorithms: a case study with min-cuts, spectral, and flow
DESCRIPTION
This talk covers the idea of anti-differentiating approximation algorithms, which is an idea to explain the success of widely used heuristic procedures. Formally, this involves finding an optimization problem solved exactly by an approximation algorithm or heuristic.TRANSCRIPT
Algorithmic !Anti-Differentiation!
A case study with !min-cuts, spectral, and flow
!!
David F. Gleich · Purdue University!Michael W. Mahoney · Berkeley ICSI!
Code "www.cs.purdue.edu/homes/dgleich/codes/l1pagerank !
1
Algorithmic Anti-differentiation!Understanding how and why heuristic procedures • Early stopping • Truncating small entries • etc
are actually algorithms for implicit objectives.
2 ICML David Gleich · Purdue
The ideal world Given Problem P Derive solution characterization C Show algorithm A "finds a solution where C holds Profit?!
Given “min-cut” Derive “max-flow is equivalent to min-cut” Show push-relabel solves max-flow " Profit!!
ICML David Gleich · Purdue 3
(The ideal world)’ Given Problem P Derive solution approx. characterization C Show algorithm A’ "finds a solution where C’ holds Profit?!
Given “sparsest-cut” Derive Rayleigh-quotient approximation Show power method finds good Rayleigh quotient Profit? !
ICML David Gleich · Purdue 4
(In academia!)!
The real world Given Task P Hack around until you find something useful Write paper presenting “novel heuristic” H for P and Profit!!
Given “find-communities” Hack around !… hidden ..!Write paper on “three steps of power method finds communities” Profit!!
ICML David Gleich · Purdue 5
(The ideal world)’’ Understand why H works!Show heuristic H solves P’ Guess and check!until you find something H solves Derive characterization of heuristic H
Given “find-communities” Hack around !!Write paper on “three steps of power method finds communities” Profit!!
ICML David Gleich · Purdue 6
If your algorithm is related to optimization, this is:
Given a procedure X, "what objective does it optimize?
The real world Algorithmic Anti-differentiation!Given heuristic H, is there a problem P’ such that H is an algorithm for P’ ?
In the smooth, unconstrained case, this is just “anti-differentiation!”
ICML David Gleich · Purdue 7
Algorithmic Anti-differentiation in the literature Mahoney & Orecchia (2011) Three steps of the power method and p-norm reg. Dhillon et al. (2007) "Spectral clustering, trace minimization & kernel k-means Saunders (1995) LSQR & Craig iterative methods for Ax = b!… many more …
ICML David Gleich · Purdue 8
Outline 1. A new derivation of the PageRank vector for an
undirected graph based on Laplacians, cuts, or flows. 2. An understanding of the implicit regularization of
PageRank “push” method. 3. The impact of this on a few applications.
ICML David Gleich · Purdue 9
The PageRank problem The PageRank random surfer 1. With probability beta, follow a
random-walk step 2. With probability (1-beta), jump
randomly ~ dist. v. Goal find the stationary dist. x!
!
�
�
��
�
�
�
�
�
Sym. adjacency matrix Diagonal degree matrix
Solution Jump-vector
(I � �AD
�1)x = (1 � �)v
ICML David Gleich · Purdue 10
[↵D + L]z = ↵v
where� = 1/(1 + ↵)and x = Dz
Equivalent to
Combinatorial "Laplacian
The Push Algorithm for PageRank!
Proposed (in closest form) in Andersen, Chung, Lang "(also by McSherry, Jeh & Widom) for personalized PageRank
Strongly related to Gauss-Seidel, coordinate descent Derived to quickly approximate PageRank with sparsity
1. x
(1)
= 0, r
(1)
= (1� �)e
i
, k = 1
2. while any r
j
> ⌧d
j
(d
j
is the degree of node j)
3. x
(k+1)
= x
(k )
+ (r
j
� ⌧d
j
⇢)e
j
4. r
(k+1)
i
=
8><
>:
⌧d
j
⇢ i = j
r
(k )
i
+ �(r
j
� ⌧d
j
⇢)/d
j
i ⇠ j
r
(k )
i
otherwise
5. k k + 1
The Push
Method!⌧ , ⇢
ICML David Gleich · Purdue 11
The push method stays local
ICML David Gleich · Purdue 12
Why do we care about push? 1. Used for empirical studies of
“communities” and an ingredient in an empirically successful community finder (Whang et al. CIKM 2013).
2. Used for “fast PageRank” approximation
3. It produces sparse approximations to PageRank!
Newman’s netscience!379 vertices, 1828 nnz “zero” on most of the nodes
v has a single "one here
13
ICML
minimize kBxkC,1
=
Pij2E
C
i ,j
|xi
� x
j
|subject to x
s
= 1, x
t
= 0, x � 0.
The s-t min-cut problem Unweighted incidence matrix
Diagonal cost matrix
14
ICML David Gleich · Purdue
The localized cut graph
Related to a construction used in “FlowImprove” "Andersen & Lang (2007); and Orecchia & Zhu (2014)
AS =
2
40 ↵dT
S 0↵dS A ↵dS
0 ↵dTS 0
3
5
Connect s to vertices
in S with weight ↵ · degree
Connect t to vertices
in
¯S with weight ↵ · degree
ICML David Gleich · Purdue 15
The localized cut graph & PageRank
ICML David Gleich · Purdue 16
minimize kB
S
xkC(↵),1
subject to x
s
= 1, x
t
= 0
x � 0.
Solve the s-t min-cut
The localized cut graph & PageRank
ICML David Gleich · Purdue 17
Solve “spectral” s-t min-cut minimize kB
S
xkC(↵),2
subject to x
s
= 1, x
t
= 0
x � 0.
The PageRank vector z that solves
(↵D + L)z = ↵v
with v = d
S
/vol(S) is a renormalized
solution of the electrical cut computation:
minimize kB
S
xkC(↵),2
subject to x
s
= 1, x
t
= 0.
Specifically, if x is the solution, then
x =
2
41
vol(S)z
0
3
5
Back to the push method Let x be the output from the push method
with 0 < � < 1, v = dS/vol(S),
⇢ = 1, and ⌧ > 0.
Set ↵ =
1��� , = ⌧vol(S)/�, and let zG solve:
minimize
1
2
kBSzk2
C(↵),2
+ kDzk1
subject to zs = 1, zt = 0, z � 0
,
where z =
h1
zG0
i.
Then x = DzG/vol(S).
Proof Write out KKT conditions
Show that the push method
solves them. Slackness was “tricky”
Regularization for sparsity
ICML David Gleich · Purdue 18
Need for normalization
A simple example
Anti-di↵erentiating Approximation Algorithms
4. Illustrative empirical results
In this section, we present several empirical results thatillustrate our theory from Section 3. We begin with a fewpedagogical examples in order to state solutions of theseproblems that are correctly normalized. These precise valuesare sensitive to small di↵erences and imprecisions in solvingthe equations. We state them here so that others can verifythe correctness of their future implementations—althoughthe codes underlying our computations are also available fordownload.4 We then illustrate how the 2-norm PageRankvector (Prob. (4)) and 1-norm regularized vector (Prob. (6))approximate the 1-norm min-cut problem (Prob. (2)) for asmall but widely-studied social graph.
4.1. Pedagogical examples
We start by solving numerically the various formulationsand variants (min-cut, PageRank, and ACL) of our basicproblem for the example graph illustrated in Figure 1. Asummary of these results may be found in Table 1. To orientthe reader about the normalizations (which can be tricky),we present the three PageRank vectors that satisfy the re-lationships xpr = Dz and vol(S )z = x(↵, S ), as predictedby the theory from Section 2 and Theorem 1. Note thatz, x(↵, S ) and z
G
round to the discrete solution producedby the exact cut if we simply threshold at about half thelargest value. Thus, and not surprisingly, all these formula-tions reveal—to a first approximation—similar propertiesof the graph. Importantly, though, note also that the vectorz
G
has the same sparsity as the true cut-vector. This is adirect consequence of the implicit `1 regularization that wecharacterize in Theorem 3. Understanding these “details”matters critically as we proceed to apply these methods tolarger graphs where there may be many small entries in thePageRank vector z.5
4.2. Newman’s netscience graph
Consider, for instance, Newman’s netscience graph (New-man, 2006), which has 379 nodes and 914 undirected edges.We considered a set S of 16 vertices, illustrated in Figure 2.In this case, then the solution of the 1-norm min-cut problem(Prob. (2)) has 15 non-zeros; the solution of the PageRankproblem (which we have shown in Prob. (4) implicitly solvesan `2 regression problem) has 284 non-zeros;6 and the so-lution from the ACL procedure (which we have shown in
4http://www.cs.purdue.edu/homes/dgleich/codes/
l1pagerank
5The reason is both statistical, in that we encourage sparsity tofind sparse partitions when the original set S is small, as well asalgorithmic, in that the algorithm does not in general touch most ofthe nodes of a large graph in order to find small clusters (Mahoney,2012). These properties were critically exploited in (Leskovecet al., 2009).
6The PageRank solution actually has 379 non-zero entries.However, only 284 of them are non-zero above a 10�5 threshold.
Table 1. Numerical results of solving each problem (Prob. (2),Prob. (4), and Prob. (6)) for the graph G and set S from Figure 1.Here, we set ↵ = 1/2, ⌧ = 10�2, ⇢ = 1, � = 2/3, and = 0.315.The vector xpr, z, and x(↵, S ) are the PageRank vectors from Theo-rem 1, where x(↵, S ) solves Prob. (4) and the others are from theproblems at the end of Section 2. The vector xcut solves the cutProb. (2), and z
G
solves Prob. (6).
Deg. xpr z x(↵, S ) xcut z
G
2 0.0788 0.0394 0.8276 1 0.27584 0.1475 0.0369 0.7742 1 0.24377 0.2362 0.0337 0.7086 1 0.21384 0.1435 0.0359 0.7533 1 0.23254 0.1297 0.0324 0.6812 1 0.19777 0.1186 0.0169 0.3557 0 03 0.0385 0.0128 0.2693 0 02 0.0167 0.0083 0.1749 0 04 0.0487 0.0122 0.2554 0 03 0.0419 0.0140 0.2933 0 0
Prob. (6) solves an `1-regularized `2 regression problem)has 24 non-zeros. The true “min-cut” set is large in boththe 2-norm PageRank problem and the regularized problem.Thus, we identify the underlying graph feature correctly;but the implicitly regularized ACL procedure does so withmany fewer non-zeros than the vanilla PageRank procedure.
5. Discussion and Conclusion
We have shown that the PageRank linear system correspondsto a 2-norm variation on a 1-norm formulation of a min-cutproblem, and we have also shown that the ACL procedurefor computing an approximate personalized PageRank vec-tor exactly solves a 1-norm regularized version of the Page-Rank 2-norm objective. Both of these results are examplesof algorithmic anti-di↵erentiation, which involves extract-ing a precise optimality characterization for the output of anapproximate or heuristic algorithmic procedure.
While a great deal of work has focused on deriving newtheoretical bounds on the objective function quality or run-time of an approximation algorithm, our focus is somewhatdi↵erent: we have been interested in understanding whatare the precise problems that approximation algorithms andheuristics solve exactly. Prior work has exploited these ideasless precisely in a much larger-scale application to draw verystrong downstream conclusions (Leskovec et al., 2009), andour empirical results have illustrated this more precisely inmuch smaller-scale and more-controlled applications.
Our hope is that one can use these insights in order to com-bine simple heuristic/algorithmic primitives in ways thatwill give rise to principled scalable machine learning algo-rithms more generally. We have early results along theselines that involve semi-supervised learning.
ICML David Gleich · Purdue 19
David Gleich · Purdue 20
Anti-di↵erentiating Approximation Algorithms
16 nonzeros 15 nonzeros 284 nonzeros 24 nonzeros
Figure 2. Examples of the di↵erent cut vectors on a portion of the netscience graph. In the left subfigure, we show the set S highlightedwith its vertices enlarged. In the other subfigures, we show the solution vectors from the various cut problems (from left to right, Probs. (2),(4), and (6), solved with min-cut, PageRank, and ACL) for this set S . Each vector determines the color and size of a vertex, where highvalues are large and dark. White vertices with outlines are numerically non-zero (which is why most of the vertices in the fourth figure areoutlined, in contrast to the third figure). The true min-cut set is large in all vectors, but the implicitly regularized problem achieves thiswith many fewer non-zeros than the vanilla PageRank problem.
References
Andersen, Reid and Lang, Kevin. An algorithm for improvinggraph partitions. In Proceedings of the 19th annual ACM-SIAM
Symposium on Discrete Algorithms, pp. 651–660, 2008.
Andersen, Reid, Chung, Fan, and Lang, Kevin. Local graph par-titioning using PageRank vectors. In Proceedings of the 47th
Annual IEEE Symposium on Foundations of Computer Science,pp. 475–486, 2006.
Arasu, Arvind, Novak, Jasmine, Tomkins, Andrew, and Tomlin,John. PageRank computation and the structure of the web:Experiments and algorithms. In Proceedings of the 11th inter-
national conference on the World Wide Web, 2002.
Chin, Hui Han, Madry, Aleksander, Miller, Gary L., and Peng,Richard. Runtime guarantees for regression problems. In Pro-
ceedings of the 4th Conference on Innovations in Theoretical
Computer Science, pp. 269–282, 2013.
Dhillon, Inderjit S., Guan, Yuqiang, and Kulis, Brian. Weightedgraph cuts without eigenvectors: A multilevel approach. IEEE
Trans. Pattern Anal. Mach. Intell., 29(11):1944–1957, 2007.
Goldberg, Andrew V. and Tarjan, Robert E. A new approach to themaximum-flow problem. Journal of the ACM, 35(4):921–940,1988.
Koutra, Danai, Ke, Tai-You, Kang, U., Chau, Duen Horng, Pao,Hsing-Kuo Kenneth, and Faloutsos, Christos. Unifying guilt-by-association approaches: Theorems and fast algorithms. In Pro-
ceedings of the 2011 European conference on Machine learning
and knowledge discovery in databases, pp. 245–260, 2011.
Kulis, Brian and Jordan, Michael I. Revisiting k-means: Newalgorithms via Bayesian nonparametrics. In Proceedings of the
29th International Conference on Machine Learning, 2012.
Langville, Amy N. and Meyer, Carl D. Google’s PageRank and
Beyond: The Science of Search Engine Rankings. PrincetonUniversity Press, 2006.
Leskovec, Jure, Lang, Kevin J., Dasgupta, Anirban, and Mahoney,Michael W. Community structure in large networks: Naturalcluster sizes and the absence of large well-defined clusters.Internet Mathematics, 6(1):29–123, 2009.
Mahoney, Michael W. Approximate computation and implicitregularization for very large-scale data analysis. In Proceedings
of the 31st Symposium on Principles of Database Systems, pp.143–154, 2012.
Mahoney, Michael W., Orecchia, Lorenzo, and Vishnoi,Nisheeth K. A local spectral method for graphs: With applica-tions to improving graph partitions and exploring data graphslocally. Journal of Machine Learning Research, 13:2339–2365,2012.
Newman, M. E. J. Finding community structure in networks usingthe eigenvectors of matrices. Phys. Rev. E, 74:036104, 2006.
Orecchia, Lorenzo and Mahoney, Michael W. Implementing regu-larization implicitly via approximate eigenvector computation.In Proceedings of the 28th International Conference on Machine
Learning, pp. 121–128, 2011.
Orecchia, Lorenzo and Zhu, Zeyuan Allen. Flow-based algorithmsfor local graph clustering. In Proceedings of the 25th ACM-
SIAM Symposium on Discrete Algorithms, pp. 1267–1286, 2014.
Page, Lawrence, Brin, Sergey, Motwani, Rajeev, and Winograd,Terry. The PageRank citation ranking: Bringing order to theweb. Technical Report 1999-66, Stanford University, November1999.
Saunders, Michael A. Solution of sparse rectangular systemsusing LSQR and CRAIG. BIT Numerical Mathematics, 35(4):588–604, 1995.
Tibshirani, Robert. Regression shrinkage and selection via thelasso. Journal of the Royal Statistical Society, Series B, 58:267–288, 1994.
Zhou, Dengyong, Bousquet, Olivier, Lal, Thomas Navin, Weston,Jason, and Scholkopf, Bernhard. Learning with local and globalconsistency. In Advances in Neural Information Processing
Systems 16, pp. 321–328, 2004.
Anti-di↵erentiating Approximation Algorithms
16 nonzeros 15 nonzeros 284 nonzeros 24 nonzeros
Figure 2. Examples of the di↵erent cut vectors on a portion of the netscience graph. In the left subfigure, we show the set S highlightedwith its vertices enlarged. In the other subfigures, we show the solution vectors from the various cut problems (from left to right, Probs. (2),(4), and (6), solved with min-cut, PageRank, and ACL) for this set S . Each vector determines the color and size of a vertex, where highvalues are large and dark. White vertices with outlines are numerically non-zero (which is why most of the vertices in the fourth figure areoutlined, in contrast to the third figure). The true min-cut set is large in all vectors, but the implicitly regularized problem achieves thiswith many fewer non-zeros than the vanilla PageRank problem.
References
Andersen, Reid and Lang, Kevin. An algorithm for improvinggraph partitions. In Proceedings of the 19th annual ACM-SIAM
Symposium on Discrete Algorithms, pp. 651–660, 2008.
Andersen, Reid, Chung, Fan, and Lang, Kevin. Local graph par-titioning using PageRank vectors. In Proceedings of the 47th
Annual IEEE Symposium on Foundations of Computer Science,pp. 475–486, 2006.
Arasu, Arvind, Novak, Jasmine, Tomkins, Andrew, and Tomlin,John. PageRank computation and the structure of the web:Experiments and algorithms. In Proceedings of the 11th inter-
national conference on the World Wide Web, 2002.
Chin, Hui Han, Madry, Aleksander, Miller, Gary L., and Peng,Richard. Runtime guarantees for regression problems. In Pro-
ceedings of the 4th Conference on Innovations in Theoretical
Computer Science, pp. 269–282, 2013.
Dhillon, Inderjit S., Guan, Yuqiang, and Kulis, Brian. Weightedgraph cuts without eigenvectors: A multilevel approach. IEEE
Trans. Pattern Anal. Mach. Intell., 29(11):1944–1957, 2007.
Goldberg, Andrew V. and Tarjan, Robert E. A new approach to themaximum-flow problem. Journal of the ACM, 35(4):921–940,1988.
Koutra, Danai, Ke, Tai-You, Kang, U., Chau, Duen Horng, Pao,Hsing-Kuo Kenneth, and Faloutsos, Christos. Unifying guilt-by-association approaches: Theorems and fast algorithms. In Pro-
ceedings of the 2011 European conference on Machine learning
and knowledge discovery in databases, pp. 245–260, 2011.
Kulis, Brian and Jordan, Michael I. Revisiting k-means: Newalgorithms via Bayesian nonparametrics. In Proceedings of the
29th International Conference on Machine Learning, 2012.
Langville, Amy N. and Meyer, Carl D. Google’s PageRank and
Beyond: The Science of Search Engine Rankings. PrincetonUniversity Press, 2006.
Leskovec, Jure, Lang, Kevin J., Dasgupta, Anirban, and Mahoney,Michael W. Community structure in large networks: Naturalcluster sizes and the absence of large well-defined clusters.Internet Mathematics, 6(1):29–123, 2009.
Mahoney, Michael W. Approximate computation and implicitregularization for very large-scale data analysis. In Proceedings
of the 31st Symposium on Principles of Database Systems, pp.143–154, 2012.
Mahoney, Michael W., Orecchia, Lorenzo, and Vishnoi,Nisheeth K. A local spectral method for graphs: With applica-tions to improving graph partitions and exploring data graphslocally. Journal of Machine Learning Research, 13:2339–2365,2012.
Newman, M. E. J. Finding community structure in networks usingthe eigenvectors of matrices. Phys. Rev. E, 74:036104, 2006.
Orecchia, Lorenzo and Mahoney, Michael W. Implementing regu-larization implicitly via approximate eigenvector computation.In Proceedings of the 28th International Conference on Machine
Learning, pp. 121–128, 2011.
Orecchia, Lorenzo and Zhu, Zeyuan Allen. Flow-based algorithmsfor local graph clustering. In Proceedings of the 25th ACM-
SIAM Symposium on Discrete Algorithms, pp. 1267–1286, 2014.
Page, Lawrence, Brin, Sergey, Motwani, Rajeev, and Winograd,Terry. The PageRank citation ranking: Bringing order to theweb. Technical Report 1999-66, Stanford University, November1999.
Saunders, Michael A. Solution of sparse rectangular systemsusing LSQR and CRAIG. BIT Numerical Mathematics, 35(4):588–604, 1995.
Tibshirani, Robert. Regression shrinkage and selection via thelasso. Journal of the Royal Statistical Society, Series B, 58:267–288, 1994.
Zhou, Dengyong, Bousquet, Olivier, Lal, Thomas Navin, Weston,Jason, and Scholkopf, Bernhard. Learning with local and globalconsistency. In Advances in Neural Information Processing
Systems 16, pp. 321–328, 2004.
Push’s sparsity helps it identify the “right” graph feature with fewer non-zeros
The set S The mincut solution
The push solution The PageRank solution
ICML
It’s easy to make this apply broadly
Easy to cook up interesting diffusion-like problems and adapt them to this framework. In particular, Zhou et al. (2004) gave a semi-supervised learning diffusion we are currently studying …
2
40 eT
S 0eS ✓A eS0 eS 0
3
5 .
ICML David Gleich · Purdue 21
minimize
1
2
kB
S
xk2
2
+ kxk1
subject to x
s
= 1, x
t
= 0, x � 0
minimize
1
2
x
T(I + ✓L)x � �x
TeS + kxk
1
subject to x � 0
110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164
165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219
Anti-di↵erentiating Approximation Algorithms
16 nonzeros 15 nonzeros 284 nonzeros 24 nonzeros
Figure 1. Examples of the di↵erent cut vectors on a portion of the net-science graph. At left we show the set S highlightedwith its vertices enlarged. In the other figures, we show the solution vectors from the various cut problems (from left toright, Probs. (2), (4), and (6), or mincut, PageRank, and ACL) for this set S. Each vector determines the color and size ofa vertex where high values are large and dark. White vertices with outlines are numerically non-zero (which is why mostof the vertices in the fourth figure are outlined in contrast to the third figure).
1.3. Application to semi-supervised learning
We recreate the USPS digit recognition problem fromZhou et al. (2003). Consider the digits 1, 2, 3, 4. Forall nodes with these true labels, construct a weightedgraph using a radial basis function:
Ai,j = exp��kdi � djk22/(2�2)
�
where � 2 {1.25, 2.5}, and where di is the vector repre-sentation of the ith image of a digit. This constructs theweighted graph that is the input to the semi-supervisedlearning method. We randomly pick nodes to be la-beled in the input and ensure that at least one nodefrom each class is chosen; and we predict labels usingthe following two kernels as in Zhou et al. (2003):
K1 = (I� �A)�1, and
K2 = (D� �A)�1,
where we used � = 0.99, as in the prior work.
When we wanted to use our new regularized cutproblems, we discovered that the weighted graphsA constructed through the radial basis function ker-nel method had many rows that were e↵ectively zero.These are images that are not similar to anything else inthe data. This fact posed a problem for our formulationbecause we need the seed vector s � 0 and the comple-ment seed vector d� s � 0. Also, the cases where wewould divide by the degree would be grossly inflatedby these small degrees. Consequently, we investigateda third kernel:
K3 = (Diag(Ae)� �A)�1.
The intuition for this kernel is that we first normal-ize the degrees via A, and then use a combinatorial
Laplacian kernel to propagate the labels. Because thedegrees are much more uniform, our new cut-basedformulations are significantly easier to apply and hadmuch better results in practice. Finally, we also con-sider using our 1-norm based regularization applied toK3 with = 10�5, which for the sake of consistency,we label rK3 although it is not a matrix-type kernel.
Detail on the error rates (Figure 2) For themulti-class prediction task explored by Zhou et al.,we show the mean error rates (percentage of mislabeledpoints) over 100 trials for four methods (K1,K2,K3
and rK3) as we vary the number of labels on fourproblems. The four problems use the digit sets 1, 2, 3, 4and 5, 6, 7, 8 with � = 1.25 and � = 2.5 in the weightfunction. And as in Zhou et al. (2003), we use themaximum entry in each row of KiS where S representsthe matrix of labeled nodes Si,j = 1 if node i is chosenas a labeled node for digit j. We note:
· First, setting � = 1.25 results in very low error rates,and all of the kernels have approximately the sameperformance once the number of labels is above 20.For fewer labels, the regularized kernel has worse
performance. With � = 1.25, the graph has a goodpartition based on the four classes. The regularizationdoes not help because it limits the propagation oflabels to all of the nodes; however, once there aresu�cient labels this e↵ect weakens.
· Second, when � = 2.5, the results are much worse.Here, K2 has the best performance and the regular-ized kernel rK3 has the worst performance.
Details on the ROC curves (Figure 3) We lookat all ROC curves for single-class prediction tasks with20 labeled nodes as we sweep along the solutions fora single label from each of the kernels. (Formally,
Recap & Conclusions
ICML David Gleich · Purdue 22
Open issues!Better treatment of directed graphs? Algorithm for rho < 1?!rho set to ½ in most “uses” Need new analysis (Coming soon)"Improvements to semi-supervised learning on graphs!
Key point We don’t solve the 1-norm regularized problem with a 1-norm solver, but with the efficient push method. Run push, and you get a 1-norm reg. with early stopping
David Gleich · Purdue Supported by NSF CAREER 1149756-CCF www.cs.purdue.edu/homes/dgleich
1. “Defined” alg. anti-diff to understand why heuristics work.
2. Found equiv. w/ PageRank and cut / flow.
3. Push & 1-norm regularization.
PageRank à s-t min-cut That equivalence works if s is degree-weighted. What if s is the uniform vector?
�
�
�����
�
��
��
��
�
A(s) =2
40 ↵sT 0↵s A ↵(d � s)0 ↵(d � s)T 0
3
5 .
David Gleich · Purdue 23
MMDS 2014