clique relaxation models of clusters in networks

9
Outline Clique Relaxation Models of Clusters in Networks Sergiy Butenko Industrial and Systems Engineering Texas A& M University College Station, TX [email protected] Joint work with B. Balasundaram July 19, 2006 S. Butenko Clique Relaxation Models of Clusters in Networks Outline Outline 1 Introduction 2 Clique Relaxations 3 Sample Numerical Results S. Butenko Clique Relaxation Models of Clusters in Networks Introduction Clique Relaxations Sample Numerical Results Social Networks A social network is described by G =(V , E ) where V is the set of “actors” and E is the set of “ties”. actors are people and a tie exists if two people know each other. actors are wire transfer database records and a tie exists if two records have the same matching field. actors are telephone numbers and a tie exists if calls were made between them. Applications: Studying terrorist networks, criminal network analysis, detecting money laundering, organized crime and several clustering and graph based data mining applications. S. Butenko Clique Relaxation Models of Clusters in Networks Introduction Clique Relaxations Sample Numerical Results Cohesive Subgroups Cohesive subgroups are “tightly knit groups” in a social network. Members of a cohesive subgroup are believed to share information, have homogeneity of thought, identity, beliefs, behavior, even food habits and illnesses. Earliest models of cohesive subgroups were cliques. S. Butenko Clique Relaxation Models of Clusters in Networks

Upload: chris-ricard

Post on 21-Feb-2016

212 views

Category:

Documents


0 download

DESCRIPTION

* actors are people and a tie exists if two people know each other. *actors are wire transfer database records and a tie exists if two records have the same matching field. *actors are telephone numbers and a tie exists if calls were made between them

TRANSCRIPT

Page 1: Clique Relaxation Models of Clusters in Networks

Outline

Clique Relaxation Models of Clusters in Networks

Sergiy Butenko

Industrial and Systems EngineeringTexas A& M University

College Station, [email protected]

Joint work with B. Balasundaram

July 19, 2006

S. Butenko Clique Relaxation Models of Clusters in Networks

Outline

Outline

1 Introduction

2 Clique Relaxations

3 Sample Numerical Results

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Social Networks

A social network is described by G = (V , E ) where V is the set of“actors” and E is the set of “ties”.

actors are people and a tie exists if two people know eachother.

actors are wire transfer database records and a tie exists if tworecords have the same matching field.

actors are telephone numbers and a tie exists if calls weremade between them.

Applications: Studying terrorist networks, criminal networkanalysis, detecting money laundering, organized crime and severalclustering and graph based data mining applications.

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Cohesive Subgroups

Cohesive subgroups are “tightly knit groups” in a socialnetwork.

Members of a cohesive subgroup are believed to shareinformation, have homogeneity of thought, identity, beliefs,behavior, even food habits and illnesses.

Earliest models of cohesive subgroups were cliques.

S. Butenko Clique Relaxation Models of Clusters in Networks

Page 2: Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Cliques and Independent Sets

1

5 2

4 3

1

5 2

4 3

Complement

{1,2,5} : maximal clique {1,2,5} : maximal independent set

{1,4} : maximal independent set {1,4} : maximal clique

{2,3,4,5} : maximum clique {2,3,4,5} : maximum independent set

GG

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Applications

Acquaintance Networks - Criminal network analysis

Wire Transfer Database Networks - Detecting moneylaundering

Call Networks - Organized crime detection

Protein Interaction Networks - Predicting protein functions

Gene Co-expression Networks - Detecting network motifs

Stock Market Networks - Stock Portfolios

Internet Graphs - Information Search and retrieval

Clustering and data-mining, wireless andtelecommunication networks, etc.

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Properties of a Cohesive Subgroup (Cluster)

Three desirable properties are:

Familiarity (degree);

Reachability (distance);

Robustness (connectivity).

Cliques are idealized structures for modeling cohesive subgroups.

But cliques were criticized for their overly restrictive nature andmodeling disadvantages.

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

k-cliques

Definition

A k-clique is a subset of vertices C such that for every i , j ∈ C,d(i , j) ≤ k.

6

5 1

4 2

3

{2, 3, 4} is a 1-clique ... the“regular” clique

{1, 2, 4, 5, 6} and {1, 2, 3, 4, 5}are 2-cliques

above sets are also maximal

S. Butenko Clique Relaxation Models of Clusters in Networks

Page 3: Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

k-clubs

Definition

A k-club is a subset of vertices D such that diam(G [D]) ≤ k.

6

5 1

4 2

3

{2,3,4} is a 1-club ... the“regular” clique

{1,2,4,5,6} is a 2-club

{1,2,3,4,5} is NOT a 2-club

maximality - harder to test

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Protein Interaction Networks

Graphs are the protein-protein interaction maps of the yeastSaccharomyces cerevisiae and a gastric pathogen HelicobacterPylori.

Table: S. Cerevisiae. Vertices: 2114; Edges: 2203; Connectedcomponents: 417.

Order #Components Order #Components

1 268 5 52 101 6 33 25 7 44 10 1458 1

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Protein Interaction Networks

Table: H. Pylori. Vertices: 1570; Edges: 1403; Connected components:858.

Order Number of Components

1 8502 7

706 1

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Results

Table: Clique, 2-Clique, and 2-Club numbers of S. Cerevisiae and H.Pylori protein maps.

Network ω(G ) ω2(G ) ω2(G )

S. Cerevisiae 6 57 57H. Pylori 3 56 56

S. Butenko Clique Relaxation Models of Clusters in Networks

Page 4: Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

A maximum 2-club and 2-clique of S. Cerevisiae.

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

A maximum 2-club and 2-clique of H. Pylori.

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

A maximum 3-clique and 3-club of S. Cerevisiae

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Some Drawbacks of k-Cliques and k-Clubs

1. k-cliques may use members outside the group.

S. Butenko Clique Relaxation Models of Clusters in Networks

Page 5: Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Some Drawbacks of k-Cliques and k-Clubs

2. k-clubs may lack familiarity and robustness.

5

4 1

3 2

6

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Some Drawbacks of k-Cliques and k-Clubs

3. k-cliques and k-clubs do not have meaningful complementarydefinitions.

5

4 1

3 2

6

5

4 1

3 2

6

5

4 1

3 2

5

4 1

3 2

Cycle on 5 vertices Also a cycle on 5 vertices !

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

k-plex

Definition

A subset of vertices S is said to be a k-plex if the minimum degreein the induced subgraph δ(G [S ]) ≥ |S | − k.

i.e. every vertex in G [S] has degree at least |S| − k.

5 4

1

3

2

6

{3,4,5,6} is a 1-plex ... the“regular” clique

{1,3,4,5,6} is a 2-plex(and NOT a 1-plex)

{1,2,3,4,5,6} is a 3-plex(and NOT a 2-plex)

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

co-k-plex

Definition

A subset of vertices S is a co-k-plex if the maximum degree in theinduced subgraph ∆(G [S ]) ≤ k − 1.

i.e. degree of every vertex in G [S ] is at most k − 1.

S is a co-k-plex in G if and only if S is a k-plex in the complementgraph G .

5 4

1

3

2

6

5 4

1

3

2

6

3-plex Co-3-plex

S. Butenko Clique Relaxation Models of Clusters in Networks

Page 6: Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Structural Properties of a k-Plex

If G is a k-plex,

1 Every subgraph of G is a k-plex;

2 If k < n+22 then diam(G ) ≤ 2;

3 κ(G ) ≥ n − 2k + 2.

Advantages:

k-plexes for “small” k values, guarantee reachability andconnectivity while relaxing familiarity.

k-plexes and co-k-plexes systematically relax cliques andindependent sets.

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Maximum k-Plex Problem

Given a graph G = (V , E ) and a positive integer k, themaximum k-plex problem (MkPP) is defined as theproblem of finding a largest k-plex in G .

We denote by ωk(G ) the k-plex number of graph G .

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Computational Complexity

Decision version: The k-Plex problem is defined as follows:Given a graph G = (V , E ) and positive integers k and c , doesthere exist a k-plex of size ≥ c in G?

Theorem

The k-Plex problem is NP-complete for any fixed positive integerk.

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

IP Formulation of the Maximum k-Plex Problem

Formulation

The k-plex number ωk(G ) of a graph G = (V , E ) admits thefollowing integer programming formulation:

ωk(G ) = max∑

i∈V

xi (1)

subject to:∑

j∈V \N[i ]

xj ≤ (k − 1)xi + (|V \ N[i ]|)(1 − xi ) ∀ i ∈ V (2)

xi ∈ {0, 1} ∀ i ∈ V (3)

S. Butenko Clique Relaxation Models of Clusters in Networks

Page 7: Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

On the maximum clique problem

The 1-plex formulation:

ω(G ) = max∑

i∈V

xi (4)

subject to: ∑

j∈V \N[i ]

xj ≤ di (1 − xi ) ∀ i ∈ V (5)

xi ∈ {0, 1} ∀ i ∈ V (6)

where di = |V \ N[i ]|.

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Complement-edge vs 1-plex formulation

In the edge formulation, xi + xj ≤ 1 ∀ (i , j) ∈ E replaces∑j∈V \N[i ]

xj ≤ di (1 − xi ) ∀ i ∈ V in the above 1-plex

formulation;

Edge constraint can be rewritten asxi + xj ≤ 1 ∀ i ∈ V , j ∈ V \ N[i ];

For each i , we can sum xi + xj ≤ 1 over j ∈ V \ N[i ] to getthe 1-plex constraint.

Surrogate constraint ideas (Glover)

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Solving Maximum k-Plex - Overview

Preprocessing techniques based on properties of a k-plex

Branch-and-cut approach embedding maximal independentset inequalities

Heuristic approaches

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Maximum 1-plex on Sanchis graphs

n d = 0.4 d = 0.5 d = 0.6 d = 0.7 d = 0.8 d = 0.9

60 0.515 0.14 0.109 0.094 0.171 0.12580 0.953 0.281 0.188 0.328 0.344 0.625100 1.656 1.015 0.374 0.718 0.313 0.61120 4.11 1.781 1.515 1.5 1.046 1.875140 6.782 2.734 1.891 1.75 1.469 2.032160 9.391 4.094 2.843 2.375 1.954 5.266180 14.625 6.562 4.843 3.765 2.735 3.766200 26.312 11.234 6.703 4.89 4.938 6.907250 53.609 27.672 15.375 12.563 8.89 29.328300 130.311 49.828 36.844 49.218 31.578 58.094350 167.64 69.874 80.093 95.046 91.89 311.734400 305.576 146.405 123.953 145.296 298.891 3164.52500 587.809 332.216 827.541 2640.28 2302.89 18802.8600 1440.16 666.262 1265.4 2445.75 6324.92 28801† (200)700 2084.06 1313.13 3128.04 1163.8 28801.1‡ (234) 28801.1† (234)800 4188.82 2407.17 5537.67 9497.54 28202.3 28801.1† (267)900 7484.71 7293.83 3138.48 28803‡ (300) 28802.5† (300) 28801† (300)1000 10609.8 12619.5 5454.63 14436.1 28803.1† (334) 28800.9† (334)

S. Butenko Clique Relaxation Models of Clusters in Networks

Page 8: Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Maximum 2-plex on Sanchis graphs

n d = 0.4 d = 0.5 d = 0.6 d = 0.7 d = 0.8 d = 0.9

60 0.781 0.438 1.031 4.592 125.295 141.63580 1.422 1.14 1.515 5.592 182.749 28298100 3.108 2.172 2.329 6.89 249.187 28801†120 6.375 3.687 4.422 10.687 815.7 28801.1†140 9.812 6 10 37.938 1775.66 28801†160 16.265 9.828 21.156 45.843 1623.88 -180 26.093 22.406 29.14 50.234 8458.37 -200 28.453 29.953 57.281 152.749 1687.75 -250 66.984 59.219 131.311 709.823 14062.5 -300 131.39 138.921 314.373 1204.26 28802.5‡ -350 255.012 174.875 638.072 2400.58 28803† -400 321.754 483.749 1085.99 3957.04 - -500 1493.74 1130.23 3728.15 15078.9 - -600 2594.98 2802.97 7990.93 28820.9‡ - -700 2451.03 4929.64 28865.5† 28801.9† - -800 4571.52 6961.75 28901† - - -900 5643.9 17661.6 28941.7† - - -1000 15307.2 22162.1 - - - -

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Erdos Collaboration Networks

In scientific collaboration networks, the vertices representscientists, and an edge connects two of them if theyco-authored at least one paper.

Two types of Erdos graphs were used in our experiments.

First type has mathematicians with Erdos number 1.

Second type has mathematicians with Erdos number 1 and 2.

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Erdos Networks - k-plex numbers

Table: Erdos networks: The number of vertices, edges, edge density, andthe maximum k-plex size for k = 1, . . . , 5.

Graph |V | |E | Edge Density ωk(G ) for k = ...1 2 3 4 5

ERDOS-97-1 472 1314 0.0118212 7 8 9 11 12ERDOS-98-1 485 1381 0.0117662 7 8 9 11 12ERDOS-99-1 492 1417 0.0117315 7 8 9 11 12ERDOS-97-2 5488 8972 0.0005959 7 8 9 11 12ERDOS-98-2 5822 9505 0.0005609 7 8 9 11 12ERDOS-99-2 6100 9939 0.0005343 8 8 9 11 12

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Erdos Networks - Runtimes

Table: The run time (in CPU seconds) of the BC algorithm on1-neighborhood Erdos networks.

ERDOS-97-1 ERDOS-98-1 ERDOS-99-1

1 196.246 216.433 235.1212 242.454 294.349 327.4073 252.445 262.061 309.3234 154.000 172.177 191.0035 164.063 188.906 193.905

S. Butenko Clique Relaxation Models of Clusters in Networks

Page 9: Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

Erdos Networks - Runtimes

Table: The reduced graph sizes and run time (in seconds) for the BCalgorithm on reduced graphs.

ERDOS-97-2 ERDOS-98-2 ERDOS-99-2k |V ′| |E ′| Time |V ′| |E ′| Time |V ′| |E ′| Time

1 174 1061 8.203 188 1160 11.766 194 1208 12.2342 174 1061 25.561 188 1160 29.28 194 1208 40.4683 174 1061 45.999 188 1160 38.781 194 1208 38.8744 77 510 2.453 105 686 4.562 116 763 10.6245 77 510 3.063 105 686 4.047 116 763 6.297

S. Butenko Clique Relaxation Models of Clusters in Networks

IntroductionClique Relaxations

Sample Numerical Results

References

B. Balasundaram, S. Butenko, and S. Trukhanov.Novel approaches for analyzing biological networks.Journal of Combinatorial Optimization, 10:23–39, 2005.

B. Balasundaram, S. Butenko, I. V. Hicks, and S. Sachdeva.Clique relaxations in social network analysis: The maximumk-plex problem.Submitted to Mathematical Programming. 2006.

http://ie.tamu.edu/people/faculty/butenko/papers

S. Butenko Clique Relaxation Models of Clusters in Networks