betti numbers of graphs - semantic scholar€¦ · betti numbers of graphs david a. johannsen david...

14
Betti Numbers of Graphs David A. Johannsen David J. Marchette Abstract This paper describes an application of research that sits at the in- tersection of commutative algebra and combinatorics: Betti numbers of graphs. In particular, we describe a correspondence between simple undi- rected graphs and a class of ideals in a polynomial ring. We then briefly introduce some of the algebraic invariants that can be associated to the ideal and the relation of these invariants to the existence of induced sub- graphs in the original graph. We discuss a novel application of the theory to a problem in anomaly detection – detection of a local non-homogeneity in a graph. We describe some variants of these ideas designed to make the computations more tractable. 1 Introduction A graph is a pair G =(V,E), where V = V (G) is a finite set {v 1 ,...,v n }, the vertices, and E = E(G) is a set of pairs of vertices, the edges. We will usually write vw for the edge {v,w}. We assume all graphs are simple: edges are pairs of distinct vertices; there are no edges from a vertex to itself. The number of vertices of a graph is called the order of the graph. The number of edges is called the size. An induced subgraph of a set of vertices W = {v i1 ,...,v i k } is the graph (W, E ), where E is the subset of E consisting of those pairs containing only elements of W : E = {w i w j E|w i ,w j W }. Thus, the induced subgraph contains all the edges between its vertices that exist in the original graph. A cycle is a set of distinct vertices {v i1 ,...,v i k } with i k > 2, such that {v i1 v i2 ,v i2 v i3 ,...,v i k v i1 }∈ E. We will also refer to the corresponding graph as a cycle, and denote it C n . A cycle has a chord if the subgraph induced by the vertices of the cycle has an edge between two non-adjacent vertices (non-adjacent as members of the cycle). The complete graph K n is the graph on n vertices such that uv E(K n ) for all u = v V (K n ). The complete bipartite graph on n, m vertices is the graph for on the disjoint union of two sets (V = X Y ), where |X | = n and |Y | = m, with all possible edges between the sets (E = {xy|x X and y Y }). We denote the graph as K n,m . Definition 1.1. A graph is chordal if all induced cycles have at least one chord. Definition 1.2. The open neighborhood of a vertex v, denoted N (v), is the set of vertices {w V |vw E}. That is, it consists of all neighbors of v. Note that 1

Upload: others

Post on 20-Apr-2020

3 views

Category:

Documents


1 download

TRANSCRIPT

Betti Numbers of Graphs

David A. Johannsen David J. Marchette

Abstract

This paper describes an application of research that sits at the in-

tersection of commutative algebra and combinatorics: Betti numbers of

graphs. In particular, we describe a correspondence between simple undi-

rected graphs and a class of ideals in a polynomial ring. We then briefly

introduce some of the algebraic invariants that can be associated to the

ideal and the relation of these invariants to the existence of induced sub-

graphs in the original graph. We discuss a novel application of the theory

to a problem in anomaly detection – detection of a local non-homogeneity

in a graph. We describe some variants of these ideas designed to make

the computations more tractable.

1 Introduction

A graph is a pair G = (V, E), where V = V (G) is a finite set {v1, . . . , vn}, thevertices, and E = E(G) is a set of pairs of vertices, the edges. We will usuallywrite vw for the edge {v, w}. We assume all graphs are simple: edges are pairsof distinct vertices; there are no edges from a vertex to itself. The number ofvertices of a graph is called the order of the graph. The number of edges iscalled the size.

An induced subgraph of a set of vertices W = {vi1 , . . . , vik} is the graph

(W, E′), where E′ is the subset of E consisting of those pairs containing onlyelements of W : E′ = {wiwj ∈ E|wi, wj ∈ W}. Thus, the induced subgraphcontains all the edges between its vertices that exist in the original graph.

A cycle is a set of distinct vertices {vi1 , . . . , vik} with ik > 2, such that

{vi1vi2 , vi2vi3 , . . . , vikvi1} ∈ E. We will also refer to the corresponding graph

as a cycle, and denote it Cn. A cycle has a chord if the subgraph inducedby the vertices of the cycle has an edge between two non-adjacent vertices(non-adjacent as members of the cycle). The complete graph Kn is the graphon n vertices such that uv ∈ E(Kn) for all u 6= v ∈ V (Kn). The completebipartite graph on n, m vertices is the graph for on the disjoint union of twosets (V = X ∪ Y ), where |X | = n and |Y | = m, with all possible edges betweenthe sets (E = {xy|x ∈ X and y ∈ Y }). We denote the graph as Kn,m.

Definition 1.1. A graph is chordal if all induced cycles have at least one chord.

Definition 1.2. The open neighborhood of a vertex v, denoted N(v), is the setof vertices {w ∈ V |vw ∈ E}. That is, it consists of all neighbors of v. Note that

1

v 6∈ N(v). The closed neighborhood of v, denoted N [v] is the set N(v) ∪ {v}.

Definition 1.3. The trivial graph is the graph with no edges; E = ∅.

Definition 1.4. Given a graph G, define star(G) to be the graph defined byadding a single vertex v to G, with edges from v to each vertex of G. We willdenote by star(n) the star on the trivial graph on n vertices. Thus star(n) = K1,n

has order n + 1 and contains one vertex of degree n and n vertices of degree 1.

Definition 1.5. A vertex v is simplicial if N(v) = Kn for some n.

2 Commutative Algebra

Suppose we have a graph G on n vertices {v1, . . . , vn}. Let k be a field (wemay assume for the purposes of this paper that k = C). Let S = k[x1, . . . , xn],the ring of polynomials in n variables with coefficients in k. Our notation foredges, vivj is evocative of monomials, and we make use of this observation inthe definition of the edge ideal of a graph.

Definition 2.1. The edge ideal of G is the ideal of S generated by the mono-mials {xixj |vivj ∈ E}. We write I(G) for the edge ideal of G.

Although there is a one-to-one correspondence between the vertices vi andthe variables xi, we will keep to the usual naming convention so that it is clearwhen we are referring to elements of the graph and when we are referring toelements of the ring. For more information on edge ideals see [1], [2], [3], [4],[5].

We can now use the tools of commutative and algebraic geometry to learnabout a graph by studying its edge ideal. One such tool, the one we will befocused on in this paper, is the minimal free resolution.

Definition 2.2. An augmented free resolution of an S-module M is an exactsequence of the form

0 −→ Fm −→ Fm−1 · · · −→ F1 −→ F0 −→ M −→ 0

where each Fi is a free S-module (a direct sum of βi copies of S). The image of

Fi in the sequence is called the ith syzygy module. Such a resolution is minimalif m is minimal over all such, and each βi is minimal. We define the minimalfree resolution of the edge ideal to be the minimal free resolution of I = I(G),in which case the free resolution becomes:

0 −→ Sβmφm−→ Sβm−1

φm−1

−→ · · ·φ2

−→ Sβ1φ1

−→ Sβ0φ0

−→ I −→ 0.

The βi are called the Betti numbers. The length of the resolution is m.

It is well known that minimal free resolutions always exist, and are uniqueup to isomorphism. Thus, the length of the minimal free resolution of an edgeideal is well defined.

2

Definition 2.3. The projective dimension of an edge ideal is the length of theminimal resolution.

Note that β0 = size(G). There is a natural Nn grading on the ring k[x1, . . . , xn],which gives a natural grading on the resolution.

0 −→⊕

j

S(−j)βm,j −→ · · · −→⊕

j

S(−j)β1,j −→⊕

j

S(−j)β0,j −→ I −→ 0,

where S(−j) is the shifted module obtained by shifting the degrees by j, so thatthe corresponding maps remain degree 0. We will be concerned with methodsfor computing the graded Betti numbers βi,j .

3 Splitting

Definition 3.1. An edge uv is a splitting edge if N [u] ⊂ N [v] or N [v] ⊂ N [u].If uv is a splitting edge, we will assume that the vertices are ordered so thatN [u] ⊂ N [v].

Theorem 3.1 ([6]). If uv is a splitting edge of G, then for all i ≥ 1 and j ≥ 0

βi,j(I(G)) = βi,j(I(G \ {uv})) +

i∑

k=1

(

n

k

)

βi−1−k,j−2−k(I(H)), (1)

where n = |N [v]| − 2, H = G \ N [v], β−1,0 = 1 and β−1,j = 0 for j > 0. Recallthat we are using the convention that uv is ordered so that N [u] ⊂ N [v].

Proof. See [6].

Lemma 3.1. A graph G has the property that all (non-trivial) induced subgraphsH contain a splitting edge if and only if G is chordal. Here “non-trivial” refersto the condition that H contain at least one edge.

Proof. (⇒) Assume G is not chordal. Then there is an induced cycle Cn withn > 3 with no chord. But it is easy to see that any such cycle does not have asplitting edge.

(⇐) If G is chordal, then it contains a simplicial vertex u. Since N(u) is aclique, uv is a splitting edge for any v ∈ N(u). Since any induced subgraph ofa chordal graph is chordal, we have the result.

Lemma 3.2. For any chordal graph G there is a splitting edge e such thatG \ {e} is chordal. In fact, any edge incident on a simplicial vertex may bechosen for e.

Proof. The only way removing and edge from G can make it non-chordal is ifit opens up an induced C4 with no chord. So in particular, we want to avoidremoving chords. Let v be a simplicial vertex. Any edge e incident to v is a

3

splitting edge. Further, it is not the chord of any cycle external to N [v]. SinceN [v] is complete, removing e cannot result in an induced cycle without a chord.Thus, any edge incident to a simplicial vertex can be removed.

We call a function s that takes a graph containing splitting edges and returnsone splitting edge a splitting edge selection strategy, or simply a strategy. Wewill say that Equation (1) is recursive for a class of graphs if there is a strategyfor which it is recursive. We do not require that any arbitrary strategy willwork, only that there is at least one. The following theorem has been statedelsewhere (see [7]), but is usually stated without explicit proof.

Theorem 3.2. Under the strategy which selects splitting edges incident on sim-plicial vertices, Equation (1) is recursive for a graph G if and only if G ischordal.

Proof. This is immediate from the two lemmas and the fact that induced sub-graphs of chordal graphs are chordal.

This shows that the algorithm to apply Equation (1) recursively alwaysworks to compute the minimal resolution of G whenever G is chordal, providedthe splitting edges are chosen appropriately. It is not the case, though, that itis recursive no matter what splitting edge is chosen at each step. The simplestexample of this is shown below. It is clear that all edges in this graph aresplitting and that this is a chordal graph. Further, if we remove any edge fromthe outside cycle, the resulting graph is chordal. However, if we remove thediagonal, the resulting graph is not chordal, and furthermore, does not containany splitting edges. The two degree two vertices are simplicial, while the verticeson which the diagonal are incident are not.

b b

b b

The graph above is a specific case of a general class of graphs, {G =star(star(n))}, for n > 1, shown in Figure 1. G is not chordal. In fact thereare a number of copies of C4 in G. Note that every edge in star(star(n)) issplitting. However, all edges are not equal. If we choose uv, then G \ uv con-tains no splitting edges, and Equation (1) fails to recurse. However, if we selectany other edge (uvi or vvj) and continue to select such edges until only uv

remains, Equation (1) recurses. At each step, H is the empty graph, sinceN [u] = N [v] = V (G).

The strategy that works is to use a perfect vertex elimination scheme ([8] alsocalled a perfect (vertex) elimination ordering (PEO) ([9]). This is an orderingof the vertices such that each vertex is simplicial in the graph induced by theremaining vertices. If one uses the strategy of selecting edges incident on verticesin a PEO, the algorithm recurses.

Now consider the graph in Figure 1 with the edge uv removed: K = G\{uv}(we will call this a mace (or a mace head) because of its similarity to the headof a type of mace; see below).

4

u

v

v1 v2 v3 v4 vn−3vn−2vn−1 vn· · · · · ·

Figure 1: star(star(n)).

None of the edges is splitting, so we cannot even start the recursion. How-ever, as we have noted, there is a strategy for using Equation (1) reflexively onG. Thus, we can turn Equation (1) around and recursively compute the Bettinumbers for K using those of G:

βi,j(I(K)) = βi,j(I(K ∪ {uv})) −i

k=1

(

n

k

)

βi−1−k,j−2−k(I(H)). (2)

Thus, if an edge uv is missing from a graph, but would be splitting if addedto the graph, we can use Equation (2). We will say that we “add a splittingedge” when we mean an edge which is splitting in the resultant graph. Notethat the subgraph H is computed on the augmented graph K ∪ {uv} usingN [v] ⊂ V (K ∪ {uv}). In computing the first summand of the right hand side,we need not select uv as the splitting edge (and in fact we must not, sincethis would bring us back to our starting point). Instead, for the equation tobe recursive, the addition of uv must induce other edges to be splitting in theresultant graph. This is the case in our example of K = star(star(n)) \ {uv}. Ifthe resultant graph is chordal, then our PEO strategy will select the appropriateedges, and we can recurse to the solution.

Thus if adding a splitting edge results in a chordal graph, then we can re-curse. Similarly, if there is a sequence of splitting edges we can add for which thefinal graph is chordal, then we can recurse. Note that algorithms that producePEOs on chordal graphs can be modified (if necessary) to produce partial PEOs:a set of vertices each of which is simplicial in the graph induced by the verticesin the rest of the list plus all vertices not in the list. If the graph is non-chordal(and contains at least one simplicial vertex), we can apply our recursion to this

5

partial PEO until we have processed all the splitting edges. Other methodsmust be employed to compute the Betti numbers for the remaining subgraphs.At this point, one could attempt to apply Equation (2). It is unclear, from apractical standpoint, whether it is better to employ Equation (2) up front, onthe original graph, or only once we hit the end of the partial PEO.

There are three ways to add a splitting edge to a graph: one can connectan isolated vertex, resulting in a pendant; one can connect a pendant, resultingin a triangle; one can add a diagonal to a square (4-cycle). The first two don’tchange the “chordal status” of the graph: they cannot make a non-chordalgraph chordal. The third requires that one of the vertices of the square have theproperty that its neighborhood is contained in the neighborhood of its opposite.Thinking of the mace head depicted in Figure 1 (with the edge uv removed),the vertex can be the “point” of the mace, resulting in the added splitting edgebeing uv. Note that this vertex is not simplicial, even after adding the edge,and so the added edge will not be selected by the algorithm until all the otheredges of the mace have been selected. Note that we could have added vivj forany i 6= j, but this would not have resulted in a chordal graph (we would haveto add many such edges). Thus, the strategy is to select splitting edges whichmake the graph “most chordal”: fill in the maximum number of squares.

4 Approximations

The above algorithm works if and only if the graph is chordal. If it is not,then at some point in the recursion a graph is reached which has no splittingedges. At this point, one has three choices: push the graph off to a non-recursivealgorithm, for example by a call to the symbolic algebra package Singular ([10])or other algebraic geometry software; give up and fail to return any answer;use some method to approximate the Betti numbers so that what is returnedis approximately correct (in some sense) rather than simply failing to returnanything. In this section we discuss one method of approximation well suitedto the recursive algorithm.

Given that the graph contains no splitting edges, theorem 3.2 cannot apply,as it only applies to splitting edges. It is possible, however, that if we chose anon-splitting edge, that the recursion would still be valid. It is easy to design agraph for which this will not work no matter which edge is chosen, however itis also known that in some cases choosing a non-splitting edge will still satisfythe recursion and result in the correct minimal free resolution. This is the basisof the approximate algorithm we have implemented.

1. While a splitting edge exists, proceed as according to the splitting edgealgorithm above.

2. If no splitting edge exists, choose an edge to use in place of the splittingedge. This choice may be made:

(a) At random.

6

(b) So that the graph G \ e with the edge removed contains the largestnumber of splitting edges (ties broken arbitrarily, and if no edgeresults in a subgraph with any splitting edges, choose the edge arbi-trarily or at random).

3. Proceed as if the chosen edge were splitting.

5 Scan Statistics

Another approach to processing large graphs is to give up on the goal of pro-ducing a minimal free resolution of the entire graph, and instead consider theresolutions of subgraphs. One could cluster the graph, reducing to a graphdefined by the clusters, but instead we consider scan statistics.

For each vertex v, define the scan subgraph s(G, v) to be the induced sub-graph of the closed neighborhood of v. We then define the scan MFR of G tobe the collection of MFRs of all scan subgraphs of v. We can then define thescan statistic of the Betti numbers to be the maximum across all scan MFR’s:

βsij = max

vβij(s(G, v)). (3)

6 Anomaly Detection

We now consider the problem of detecting an anomaly in a graph. In this casewe define “anomaly” to mean a small region (set of vertices) in the graph whichhave a greater number of communications amongst them than is “typical” inthe remainder of the graph. We make this precise in the following.

The Erdos-Renyı random graph ER(n, p) is the random graph on n verticeswhose edges are present independently with probability p. The κ(n, p, m, q)random graph is a model for a graph with an anomaly: a small subset of vertices“communicating” amongst themselves at a higher rate. Here there are twodisjoint sets of vertices, V = K ∪ E, with an edge between vertices i and j

independently with probability:

• p if either of i, j ∈ K.

• q if both i, j ∈ E.

It is our hypothesis that the Betti numbers, and in particular the scan MFR,can provide statistics for detecting this type of anomaly with higher power thanother approaches. Note that the size of the graph is already known to have somepower for detecting this type of anomaly ([11, 12, 13, 14]). It is reasonable, giventhat the Betti numbers essentially count the number of certain subgraphs, thatthey would have power superior to size, and perhaps other commonly usedinvariants, for problems of this type. To investigate this hypothesis, we presentsome simulations in the next section.

7

Figure 2: The Petersen graph.

7 Examples and Simulation

Figure 2 shows a well-known graph called the Petersen graph. It’s MFR isdepicted in Figure 3. Here we display the Betti numbers as a curve in the upperpanel, with two data images below. In these, the values of the total (top) andgraded (bottom) Betti numbers are depicted as gray-scale squares, with smallvalues in black, large values in increasing whiteness. Zero graded Betti numbersare completely white. The numerical values are depicted in the squares. Forlarge graphs with very large Betti numbers, we drop the numerical values fromthe plot.

We illustrate the timing improvement of the recursive algorithm over thealgorithm in Singular in Table 1. For each order, we randomly generated 100chordal graphs, and computed the time to calculate the MFR using each ofthe two algorithms. As can be seen, the recursive algorithm on these randomchordal graphs is vastly faster than the algorithm in Singular, particularly forgraphs of order 15. Figures 4 and 5 show timings of the recursive algorithm onlarger graphs, both random chordal graphs and Erdos-Renyı random graphs.In the latter, Singular is called for those graphs that have no splitting edges,but these are typically small enough in these simulations that there is not anexcessive time cost as a result.

Figure 6 shows the percent difference between the true Betti numbers andapproximate Betti numbers for 1000 ER(20, .1) graphs.

Figures 7 and 8 show the scan MFR for homogeneous (ER) and inhomoge-neous (κ) graphs. As one can see from the figures, the Betti numbers tend to behigher for the inhomogenious graph, indicating that these are likely to be usefulfor detecting this type of inhomogeneity. Since each graded Betti number cor-responds to a count of the number of subgraphs of certain types, it is possible,at least in principle, to use Betti numbers to design tests to detect very specifictypes of deviation from homogeneity. This is a subject for future work.

8

2 3 4 5 6 7 820

4060

8010

0

β

15 45 87 100 60 20 4

15 30

15

10

72

5

80

20

30

30 20 4

Figure 3: The minimal free resolution of the Petersen graph.

Table 1: Timings (in seconds) for the chordal algorithm as compared to Singularon a selection of chordal graphs.

Recursive Algorithmn Minimum 1st Quartile Median Mean 3rd Quartile Maximum10 0.011 0.017 0.022 0.022 0.028 0.03711 0.014 0.021 0.028 0.029 0.035 0.05112 0.016 0.029 0.040 0.040 0.048 0.07313 0.015 0.029 0.048 0.048 0.064 0.09814 0.019 0.041 0.063 0.063 0.081 0.12415 0.023 0.060 0.087 0.084 0.110 0.159

Singularn Minimum 1st Quartile Median Mean 3rd Quartile Maximum10 0.014 0.032 0.070 0.102 0.12 0.41511 0.014 0.05625 0.188 0.368 0.52 1.84812 0.023 0.2333 1.154 1.875 2.84 7.81613 0.024 0.417 3.084 6.999 11.5 42.4214 0.038 3.646 25.86 52.09 74.97 275.415 0.090 40.86 174.1 320.5 476.4 1450

9

10 20 30 40 50

05

1015

n

Sec

onds

Figure 4: Timings for random chordal graphs.

10 20 30 40

05

1015

2025

30

n

Sec

onds

Figure 5: Timings for Erdos-Renyı random graphs.

10

Figure 6: Percent difference between true Betti numbers and approximate Bettinumbers using the approximation discussed in the text.

5 10 15 20

040

000

8000

0

β

Figure 7: Scan statistic for an ER(100, 0.1) graph. The logarithm of the valuesof the graded Betti numbers are displayed to better show the structure.

11

5 10 15 20

040

0000

1000

000

β

Figure 8: Scan statistic for a κ(100, 0.1, 10, 0.8) graph. The logarithm of thevalues of the graded Betti numbers are displayed to better show the structure.

Figure 9: Powers for the true Betti numbers (top), approximate Betti numbers(middle) and scan Betti numbers (bottom) for a simulation from ER(50, 0.05)versus κ(50, 0.05, 6, 0.8). Yellow corresponds to high power, red to low, whiteto power ≤ 0.1. Here α = 0.05. Here the upper left corner corresponds to thepower of the size of the graph, β1,1.

12

8 Conclusion

We have discussed the proof of Ha and Van Tuyl’s theorem that Equation (1) isrecursive on chordal graphs, and used this, with the notion of a perfect elimina-tion ordering (PEO) to provide an explicit strategy for applying this theorem.A trivial observation allows us to apply the same algorithm to slightly moregeneral graphs than chordal graphs.

More generally, the idea of using a partial PEO can reduce the computationof the Betti numbers of an edge ideal in large graphs to that of computationson smaller graphs. This, combined with “filling in the squares” and other al-gorithms, or with specific formulas for different classes of graphs, can make thecalculation of the Betti numbers of relatively large graphs practical.

We have also discussed ways to approximate the Betti numbers for graphsthat are not chordal, and to compute related statistics, the scan MFR, whichcan be computed relatively efficiently for many graphs using these algorithms.We demonstrated empirically that some of the Betti numbers have high powerfor detecting certain types of anomalies (non-homogeneities) in graphs. Muchwork is still needed to develop the theory of the scan MFR, and to exploreapplications of the Betti numbers of graphs.

Acknowledgments

This work was funded in part by the Office of Naval Research under the In-HouseLaboratory Independent Research program.

References

[1] S. Jacques, Betti numbers of graph ideals. PhD thesis, 2004.

[2] S. Jacques and M. Katzman, “The betti numbers of forests,” 2005.

[3] R. P. Stanley, Combinatorics and Commutative Algebra. Basel: Birkhuser,second ed., 1996.

[4] R. H. Villarreal, Monomial Algebras, vol. 238 of Monographs and Textbooksin Pure and Applied Mathematics. New York: Marcel Dekker, Inc., 2001.

[5] E. Miller and B. Sturmfels, Combinatorial Commutative Algebra, vol. 227of Graduate Texts in Mathematics. New York: Springer, 2005.

[6] H. T. Ha and A. Van Tuyl, “Splittable ideals and the resolutions of mono-mial ideals,” 2006.

[7] H. T. Ha and A. Van Tuyl, “Monomial ideals, edge ideals of hypergraphs,and their minimal graded free resolutions,” 2006.

[8] R. Balakrishnan and K. Ranganathan, A Textbook of Graph Theory. NewYork: Springer, 2000.

13

[9] D. B. West, Introduction to Graph Theory. Upper Saddle River, NJ: Pren-tice Hall, 2001.

[10] G.-M. Greuel and G. Pfister, A Singular Introduction to Commutative Al-gebra. Berlin: Springer, 2002.

[11] C. E. Priebe, J. M. Conroy, D. J. Marchette, and Y. Park, “Scan statisticson enron graphs,” Computational and Mathematical Organization Theory,vol. 11, pp. 229–247, 2005.

[12] A. Rukhin, Asymptotic analysis of various statistics for random graph in-ference. PhD thesis, 2009.

[13] J. Grothendieck, C. E. Priebe, and A. L. Gorin, “Statistical inference onattributed random graphs: Fusion of graph features and content,” Compu-tational Statistics and Data Analysis, vol. 54, pp. 1777–1790, 2010.

[14] C. E. Priebe, Y. Park, D. J. Marchette, J. M. Conroy, J. Grothendieck, andA. L. Gorin, “Statistical inference on attributed random graphs: Fusion ofgraph features and content: An experiment on time series of enron graphs,”Computational Statistics and Data Analysis, vol. 54, pp. 1776–1776, 2010.

14