maximum and maximal cliques in multipartite graphs...

48
Maximum and Maximal Cliques in Multipartite Graphs Charles Phillips Department of Electrical Engineering and Computer Science University of Tennessee 3/11/2015

Upload: nguyenthien

Post on 31-Aug-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Maximum and Maximal Cliques in Multipartite Graphs

Charles Phillips

Department of Electrical Engineering and Computer Science

University of Tennessee

3/11/2015

Cliques in Multipartite Graphs

Cliques

Maximum Clique

Maximal Clique Enumeration

Bipartite and multipartite graphs

Maximal Biclique Enumeration

Maximum k-partite cliques

Maximal k-partite clique enumeration

Applications

Molecular Biology Telecommunications Natural Language Processing Social Network Analysis Transportation Operations Research Chemistry Textile Manufacturing Drug Discovery Phylogeny Ad Hoc Networking Computational Biology Fault Diagnosis Computer Vision

History

Richard Karp

R. Duncan Luce

Pál Turán Leo Moser René Peeters

Etsuji Tomita

Definitions NP-hard, NP-complete

– Any exact algorithm must take exponential time in the worst case (as far as we know).

– “In complexity theory we make the following sweeping generalizations: If an algorithm runs in polynomial time, it is fast; otherwise it is slow. Fast is good; slow is bad. A problem that we can solve by a fast algorithm is easy; a problem that we can't is hard.” (Tovey 2002)

Definitions Clique

– A set of vertices with all possible edges. – A complete subgraph.

Maximum Clique – The largest clique in a graph

Maximal Clique – A clique to which no vertex can be added to form a larger clique. – A clique that is not a proper subset of another clique.

a b

d e f g

c

h

Definitions Clique

– A set of vertices with all possible edges. – A complete subgraph.

Maximum Clique – The largest clique in a graph

Maximal Clique – A clique to which no vertex can be added to form a larger clique. – A clique that is not a proper subset of another clique.

a b

d e f g

c

h

Maximum clique: { a, b, d, e } Maximal cliques: { a, b, d, e } { c, f, g } { c, g, h } { e, f }

Background

Maximum clique is not approximable to within better than a linear factor

No polynomial time algorithm can approximate maximum clique within a factor of O(n1 − ε), for any ε > 0, unless P=NP.

Best known approximation algorithm: O(n(log log n)2/log3n) (Feige 2004)

Definitions Bipartite Graph

– A graph whose vertices can be divided into two non-empty disjoint sets U and V such that every edge connects a vertex in U with a vertex in V.

Tripartite Graph

Multipartite Graph – A k-partite graph, k ≥ 2

Definitions

Partite set

Interpartite edge

Intrapartite edge

Complete k-partite graph

Background

Bipartite Graph

How many ways can a connected bipartite graph be partitioned into partite sets?

Bipartite Graph

How many ways can a connected bipartite graph be partitioned into two partite sets?

Only one!

Bipartite Graph

How many ways can a connected bipartite graph be partitioned into two partite sets?

Only one!

Choose an arbitrary vertex v and place it U. All neighbors of v must then be in V. All neighbors of those neighbors must be in U. And so on.

Definitions k-clique

– A clique with k vertices.

Biclique

– A complete bipartite graph

– Km,n

– All possible interpartite edges

Triclique

– A complete tripartite graph

– All possible interpartite edges

k-partite clique

– A complete k-partite graph

– All possible interpartite edges

Definitions

Vertex and Edge Maximum

Bipartite Graph Vertex-Maximum Biclique Edge-Maximum Biclique (8 vertices, 7 edges) (6 vertices, 9 edges) Polynomial Time NP-hard NP-hard NP-hard Vertex-Maximum Triclique Edge-Maximum Triclique Tripartite Graph (7 vertices, 10 edges) (6 vertices, 12 edges)

Turán's Theorem

Turán's Theorem

We cannot add an edge to K3,2,2 without creating a K4.

Maximal Clique Enumeration

List all maximal cliques

First methods (1957) used induction on 3-cliques

Methods were developed (1964-1970) using the vertex sequence method (aka point removal method) – Produces maximal cliques of G from

maximal cliques of G \ {v}, v ∈ V

– Must maintain all cliques in memory

Bron-Kerbosch Algorithm

Bron-Kerbosch Algorithm Algorithm 1: Bron-Kerbosch with pivot. Input: a graph, G = (V,E) 1 R := Ø, P := V, X := Ø 2 BronKerbosch(R, P, X) 3 if P and X are both empty 4 report R as a maximal clique 5 choose a pivot vertex u in P U X 6 for each vertex v in P \ N(u) 7 BronKerbosch(R U v, P ∩ N(v), X ∩ N(v)) 8 P := P \ v 9 X := X U v R: The current clique P: Vertices that can extend the current clique X: Vertices that have already been used to extend the current clique

Pseudocode adapted from http://en.wikipedia.org/wiki/Bron%E2%80%93Kerbosch_algorithm

Maximal Biclique Enumeration

List all maximal bicliques

All possible subsets (Koch)

MBEA

MBEA

Maximal Triclique Enumeration

Will maximal bicliques help us obtain maximal tricliques?

Does a maximal triclique contain a maximal biclique as a subgraph?

What about maximal k-partite clique enumeration?

A B

C

A tripartite graph with partite sets A, B and C.

A B

C

The red vertices are a maximal biclique in partite sets A and B.

A B

C

The red vertices are a maximal triclique in the graph. But the red vertices in A and B are not a maximal biclique.

Transaction Data

Transaction Data

Also known as market basket data. Each transaction is a set of

items. {Bread, Milk, Cheese} {Milk, Cheese, Eggs, Sugar} {Bread, Milk, Eggs} {Cheese, Eggs, Sugar, Flour}

Itemset – A set of items.

Support – The number of transactions in which an itemset occurs.

Frequent Itemset – An itemset whose support is at or above some specified threshold.

Closed Itemset – An itemset that has no superset with the same support.

Maximal Itemset – An itemset that has no superset that is frequent.

Problem Equivalence

Transaction

Items

1 ABDEF

2 ABF

3 BCDE

4 ABCE

5 CDE

6 CDEF

7 AEF

Problem Equivalence

Transaction

Items

1 ABDEF

2 ABF

3 BCDE

4 ABCE

5 CDE

6 CDEF

7 AEF

1

2

4

3

5

6

7

A

B

D

C

E

F

Results

0

10000

20000

30000

40000

50000

60000

70000

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Wal

lclo

ck R

un

tim

es in

Sec

on

ds

Number of vertices in the smaller vertex set

MBEA

LCM

Results

1.E-02

1.E-01

1.E+00

1.E+01

1.E+02

1.E+03

1.E+04

1.E+05

0.0

1

0.0

2

0.0

3

0.0

4

0.0

5

0.0

6

0.0

7

0.0

8

0.0

9

0.1

0.1

1

0.1

2

0.1

3

0.1

4

0.1

5

0.1

6

0.1

7

0.1

8

0.1

9

0.2

Log

(Wal

lclo

ck R

un

tim

e in

sec

on

ds)

p-value threshold

MBEA

LCM

MICA

Results

What if we add all possible intrapartite edges, then run Bron-Kerbosch?

Results

0.1

1

10

100

1000

0.01 0.02 0.03 0.04 0.05

Ru

nti

me

(Se

c)

Density

1:1 Partite Set Ratio

MBEA

BK-K

0.01

0.1

1

10

100

0.01 0.02 0.03 0.04 0.05 0.06 0.07

Ru

nti

me

(Se

c)

Density

2:1 Partite Set Ratio

MBEA

BK-K

0.1

1

10

100

1000

0.01 0.02 0.03 0.04 0.05 0.06 0.07

Ru

nti

me

(Se

c)

Density

3:1 Partite Set Ratio

MBEA

BK-K

0.1

1

10

100

1000

0.01 0.02 0.03 0.04 0.05 0.06 R

un

tim

e (

Sec)

Density

4:1 Partite Set Ratio

MBEA

BK-K

Results

0.1

1

10

100

1000

0.01 0.02 0.03 0.04 0.05

Ru

nti

me

(Se

c)

Density

1:1 Partite Set Ratio

MBEA

BK-K

0.01

0.1

1

10

100

0.01 0.02 0.03 0.04 0.05 0.06 0.07

Ru

nti

me

(Se

c)

Density

2:1 Partite Set Ratio

MBEA

BK-K

0.1

1

10

100

1000

0.01 0.02 0.03 0.04 0.05 0.06 0.07

Ru

nti

me

(Se

c)

Density

3:1 Partite Set Ratio

MBEA

BK-K

0.1

1

10

100

1000

0.01 0.02 0.03 0.04 0.05 0.06 R

un

tim

e (

Sec)

Density

4:1 Partite Set Ratio

MBEA

BK-K

Results

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

1 2 3 4 5 6 7 8 9 10

Density above which BK-K beats MBEA

Partite Set Size Ratio

Results

Preprocessing

– Interpartite rule: remove all edges between vertices in different partite sets that are not part of a 3-clique

– Intrapartite rule: remove all edges between vertices in the same partite set that are not part of a 3-clique

– We added all possible intrapartite edges so we could use Bron-Kerbosch

Results

The speedup achieved by interpartite preprocessing and intrapartite preprocessing on random 3-partite graphs with 2000 vertices in each partite set, but varying density. Interpartite preprocessing is very effective on graphs with low density, being more effective the lower the density. It gradually becomes ineffective as density increases; although at no time does its overhead produce a substantial runtime cost. Conversely, intrapartite preprocessing is never effective. It results in much longer runtimes at low density, and while its overhead eventually becomes insubstantial as density increases, at no point does it provide any benefit.

Results

MBEA does better then BK-P on sparser graphs

MBEA does better than BK-P on unbalanced graphs

BK-P does better as the size gets larger

Problems

If k complete graphs, each having exactly k vertices, have the property that every pair of complete graphs has at most one shared vertex, then the union of the graphs can be colored with k colors.

Alternate Formulation – k committees – Each committee has k members – The committees all use the same room, which has k chairs – At most one person belongs to the intersection of any two

committees.

Is it possible to assign the committee members to chairs in such a way that each member sits in the same chair for all the different committees to which he or she belongs?

In this model of the problem, the people correspond to graph vertices, committees correspond to complete graphs, and chairs correspond to vertex colors

Problems

If k complete graphs, each having exactly k vertices, have the property that every pair of complete graphs has at most one shared vertex, then the union of the graphs can be colored with k colors.

Alternate Formulation – k committees – Each committee has k members – The committees all use the same room, which has k chairs – At most one person belongs to the intersection of any two

committees.

Is it possible to assign the committee members to chairs in such a way that each member sits in the same chair for all the different committees to which he or she belongs?

In this model of the problem, the people correspond to graph vertices, committees correspond to complete graphs, and chairs correspond to vertex colors

Erdös–Faber–Lovász conjecture

Adapted from http://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93Faber%E2%80%93Lov%C3%A1sz_conjecture

Future Directions

Modelling large heterogeneous data

Maximum k-partite clique enumeration

– Vertex Maximum

– Edge Maximum

References I. Bomze, M. Budinich, P. Pardalos, M. Pelillo, “The Maximum Clique Problem,“ In: Du, D.-Z., Pardalos,

P.M. (eds.) Handbook of Combinatorial Optimization, vol. 4. Kluwer Academic Publishers, 1999.

D. Eppstein, “Arboricity and bipartite subgraph listing algorithms,” Inf. Process. Lett., vol. 51, no. 4, pp. 207–211, 1994.

U. Feige, "Approximating maximum clique by removing subgraphs", SIAM Journal on Discrete Mathematics 18 (2): 219–225, 2004.

M. R. Garey and D. S. Johnson, Computers and Intractability. New York: W. H. Freeman, 1979.

R. D. Luce, R.Perry, and D. Albert, "A method of matrix analysis of group structure”, Psychometrika 14 (2): 95–116, 1949.

K. Makino and T. Uno, “New algorithms for enumerating all maximal cliques.” in Proceeding, 9th Scandinavian Workshop Algorithm Theory (SWAT), pp. 260–272, 2004.

Miller, R. E. and Muller, D. E. A problem of maximum consistent subsets. IBM Research Report RC-240, J. T. Watson Research Center, Yorktown Heights, NY, 1960.

R. Peeters, “The maximum edge biclique problem is np-complete,” Discrete Appl. Math., vol. 131, no. 3, pp. 651–654, 2003.

E. Tomita, A. Tanaka, H. Takahashi, “The worst-case time complexity for generating all maximal cliques and computational experiments", Theoretical Computer Science 363 (1): 28–42, 2006.

References

C. A. Tovey, Tutorial on computational complexity. Interfaces 32, 3, 30-61, 2002.

T. Uno, M. Kiyomi, and H. Arimura, “LCM ver.2: Efficient mining algorithms for frequent/closed/maximal itemsets,” in Proceedings, FIMI’04: Workshop on Frequent Itemset Mining Implementations, Brighton UK, November 2004.

M. J. Zaki and M. Ogihara, “Theoretical foundations of association rules,” in Proceedings, 3rd SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 1998.

Y. Zhang, C. A. Phillips, G. L. Rogers, E. J. Baker, E. J. Chesler, M. A. Langston, “On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types,” BMC Bioinformatics 15, 110, 2014.

Homework 1. List all maximal cliques in the following graph. Identify which cliques in your list are maximum, if any.

2. What is the smallest bipartite graph in which a vertex-maximum

biclique is not an edge-maximum biclique?

3. We saw that the Turán graph T(7,3) is K3,2,2.

a. What is T(13,4)?

b. What is T(17,5)?

b c a

e d f g

i h j

Questions