evaluation of ilp-based approaches for partitioning into ...€¦ · introduction methods...

33
Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components Sharon Bruckner 1 Falk H¨ uffner 2 Christian Komusiewicz 2 Rolf Niedermeier 2 1 Institut f¨ ur Mathematik, Freie Universit¨ at Berlin 2 Institut f¨ ur Softwaretechnik und Theoretische Informatik, TU Berlin 5 June 2013 S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 1/22

Upload: others

Post on 17-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Evaluation of ILP-based Approaches forPartitioning into Colorful Components

Sharon Bruckner1 Falk Huffner2

Christian Komusiewicz2 Rolf Niedermeier2

1Institut fur Mathematik, Freie Universitat Berlin

2Institut fur Softwaretechnik und Theoretische Informatik, TU Berlin

5 June 2013

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 1/22

Page 2: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Wikipedia interlanguage links

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 2/22

Page 3: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Wikipedia interlanguage links

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 2/22

Page 4: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Wrong interlanguage links

Schinken (German)→ Prosciutto (Italian)→ Пршут (Russian)→ Parmaschinken (German)

Assumption

If there is a link path from a word in some language to adifferent word in the same language, then at least one of thelinks on the path is wrong.

Poblem

How can we fix the inconsistencies?

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 3/22

Page 5: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Wrong interlanguage links

Schinken (German)→ Prosciutto (Italian)→ Пршут (Russian)→ Parmaschinken (German)

Assumption

If there is a link path from a word in some language to adifferent word in the same language, then at least one of thelinks on the path is wrong.

Poblem

How can we fix the inconsistencies?

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 3/22

Page 6: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Wrong interlanguage links

Schinken (German)→ Prosciutto (Italian)→ Пршут (Russian)→ Parmaschinken (German)

Assumption

If there is a link path from a word in some language to adifferent word in the same language, then at least one of thelinks on the path is wrong.

Poblem

How can we fix the inconsistencies?

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 3/22

Page 7: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Model

COLORFUL COMPONENTS

Instance: An undirected graph G = (V , E ) and a coloring ofthe vertices χ : V → {1, . . . , c }.Task: Delete a minimum number of edges such that allconnected components are colorful, that is, they do notcontain two vertices of the same color.

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 4/22

Page 8: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Applications of Colorful Components

General scenario: Record linkage

Matching entities between different databases, where linksbetween entities are fuzzy.

Matching items in online shop price comparison

Matching user profiles across different social networks

. . .

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 5/22

Page 9: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Known results

COLORFUL COMPONENTS is NP-hard already with threecolors.

With c colors and k errors to be fixed, COLORFULCOMPONENTS can be solved in O ((c − 1)k ·m) time withbranch-and-bound.

COLORFUL COMPONENTS can be approximated within afactor of c − 1 in O (m2) time.

Several polynomial-time preprocessing rules are known.

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 6/22

Page 10: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Method 1: Implicit Hitting Set

HITTING SET

Instance: A ground set U and a set of circuits S1, . . . , Snwith Si ⊆ U for 1 6 i 6 n .Task: Find a minimum-size hitting set, that is, a set H ⊆ U withH ∩ Si 6= ∅ for all 1 6 i 6 n .

Observation

We can reduce COLORFUL COMPONENTS to HITTING SET: Theground set U is the set of edges, and the circuits to be hit arethe paths between identically-colored vertices.

Problem

Exponentially many circuits!

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 7/22

Page 11: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Method 1: Implicit Hitting Set

HITTING SET

Instance: A ground set U and a set of circuits S1, . . . , Snwith Si ⊆ U for 1 6 i 6 n .Task: Find a minimum-size hitting set, that is, a set H ⊆ U withH ∩ Si 6= ∅ for all 1 6 i 6 n .

Observation

We can reduce COLORFUL COMPONENTS to HITTING SET: Theground set U is the set of edges, and the circuits to be hit arethe paths between identically-colored vertices.

Problem

Exponentially many circuits!

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 7/22

Page 12: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Method 1: Implicit Hitting Set

HITTING SET

Instance: A ground set U and a set of circuits S1, . . . , Snwith Si ⊆ U for 1 6 i 6 n .Task: Find a minimum-size hitting set, that is, a set H ⊆ U withH ∩ Si 6= ∅ for all 1 6 i 6 n .

Observation

We can reduce COLORFUL COMPONENTS to HITTING SET: Theground set U is the set of edges, and the circuits to be hit arethe paths between identically-colored vertices.

Problem

Exponentially many circuits!

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 7/22

Page 13: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Method 1: Implicit Hitting Set

v In an implicit hitting set problem, the circuits have animplicit description, and a polynomial-time oracle is availablethat, given a putative hitting set H , either confirms that H is ahitting set or produces a circuit that is not hit by H .

Several approaches to solving implicit hitting set problems areknown, which use an ILP solver as a black box for the HITTINGSET subproblems.

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 8/22

Page 14: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Method 1: Implicit Hitting Set

v In an implicit hitting set problem, the circuits have animplicit description, and a polynomial-time oracle is availablethat, given a putative hitting set H , either confirms that H is ahitting set or produces a circuit that is not hit by H .Several approaches to solving implicit hitting set problems areknown, which use an ILP solver as a black box for the HITTINGSET subproblems.

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 8/22

Page 15: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Method 2: Row generation

Idea

Instead of using the ILP solver as a black box, we can use rowgeneration (“lazy constraints”):

Start with an empty constraint set

When the solver finds a solution, check for a violatedconstraint in a callback and add it to the constraint set

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 9/22

Page 16: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Method 3: Clique Partitioning ILP formulation

CLIQUE PARTITIONING

Instance: A vertex set V with a weight function s :(V2

)→ Q.

Task: Find a cluster graph (V , E ) thatminimizes

∑{u ,v}∈E s(u , v).

s(u , v) =

∞ if χ(u) = χ(v),−1 if {u , v } ∈ E ,0 otherwise.

euv + evw − euw 6 1

euv − evw + euw 6 1

−euv + evw + euw 6 1

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 10/22

Page 17: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Method 3: Clique Partitioning ILP formulation

CLIQUE PARTITIONING

Instance: A vertex set V with a weight function s :(V2

)→ Q.

Task: Find a cluster graph (V , E ) thatminimizes

∑{u ,v}∈E s(u , v).

s(u , v) =

∞ if χ(u) = χ(v),−1 if {u , v } ∈ E ,0 otherwise.

euv + evw − euw 6 1

euv − evw + euw 6 1

−euv + evw + euw 6 1

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 10/22

Page 18: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Method 3: Clique Partitioning ILP formulation

CLIQUE PARTITIONING

Instance: A vertex set V with a weight function s :(V2

)→ Q.

Task: Find a cluster graph (V , E ) thatminimizes

∑{u ,v}∈E s(u , v).

s(u , v) =

∞ if χ(u) = χ(v),−1 if {u , v } ∈ E ,0 otherwise.

euv + evw − euw 6 1

euv − evw + euw 6 1

−euv + evw + euw 6 1

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 10/22

Page 19: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Cutting Planes

Definition

A cutting plane is a valid constraint that cuts off fractionalsolutions.

Tree cut

Let T = (VT , ET ) be a subgraph of G that is a tree such that allleaves L of the tree have color c , but no inner vertex has. Then∑

uv∈ET

(1− euv ) > |L |− 1

is a valid inequality.

We find only tree cuts with 1 or 2 internal vertices.

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 11/22

Page 20: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Cutting Planes

Definition

A cutting plane is a valid constraint that cuts off fractionalsolutions.

Tree cut

Let T = (VT , ET ) be a subgraph of G that is a tree such that allleaves L of the tree have color c , but no inner vertex has. Then∑

uv∈ET

(1− euv ) > |L |− 1

is a valid inequality.

We find only tree cuts with 1 or 2 internal vertices.

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 11/22

Page 21: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Cutting Planes

Definition

A cutting plane is a valid constraint that cuts off fractionalsolutions.

Tree cut

Let T = (VT , ET ) be a subgraph of G that is a tree such that allleaves L of the tree have color c , but no inner vertex has. Then∑

uv∈ET

(1− euv ) > |L |− 1

is a valid inequality.

We find only tree cuts with 1 or 2 internal vertices.

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 11/22

Page 22: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Greedy Heuristics

Merge-based:Start with singleton clustersGreedily merge two clusters based on cut costs and mergecosts

Move-based:Start with singleton clustersGreedily move one vertex from one cluster to anotherOnce no improvement is possible, merge clusters andrepeat

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 12/22

Page 23: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Implementation

Data reduction

ILP-approaches implemented in C++ using CPLEX 12.3

3.4GHz Intel Core i3-2130 with 3MB cache and 8GB mainmemory

Source code available atwww.user.tu-berlin.de/hueffner/colcom/

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 13/22

Page 24: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Wikipedia interlanguage links

30 languages

11,977,500 vertices, 46,695,719 edges

2,698,241 connected components, of which 225,760 arenot colorful

largest connected component has 1,828 vertices and14,403 edges

CLIQUE PARTITIONING algorithm finds solution in 80minutes

Optimal solution deletes 618,660 edges

434,849 suggested new links

Merge-based heuristic has an error of 0.81%

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 14/22

Page 25: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Wikipedia interlanguage links

30 languages

11,977,500 vertices, 46,695,719 edges

2,698,241 connected components, of which 225,760 arenot colorful

largest connected component has 1,828 vertices and14,403 edges

CLIQUE PARTITIONING algorithm finds solution in 80minutes

Optimal solution deletes 618,660 edges

434,849 suggested new links

Merge-based heuristic has an error of 0.81%

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 14/22

Page 26: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Wikipedia example

ןקניש

Пармская ветчина

火腿

Prosciutto crudo

Prosciutto di Parma

Jambon de Parme

JamónProsciutto

Пршут Parmaschinken

Прошутто

Ham

וטושורפ

Prosciutto

Ветчина

Jamón de Parma

Окорок

Schinken

Jambon

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 15/22

Page 27: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Random graph model

Model is the recovery of colorful components that have beenperturbed.

number of colors: {3, 5, 8}

number of vertices: {60, 100, 170}

probability that a component contains a vertex of acertain color: {0.4, 0.6, 0.9}

probability that between two vertices in a componentthere is an edge: {0.4, 0.6, 0.9}

probability that between two vertices from differentcomponents there is an edge: {0.01, 0.02, 0.04}.

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 16/22

Page 28: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Running times for benchmark set

10-2 10-1 100 101 102

time (s)

0

20

40

60

80

100in

stan

ces

solv

ed (%

)

Implicit Hitting SetHitting Set row generationClique Partitioning ILPClique Partitioning w/o cutsBranching

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 17/22

Page 29: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Depencency on number of vertices

40 60 80 100 120 140n

10-2

10-1

100

101

102

time

(s)

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 18/22

Page 30: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Depencency on probability of intracluster edges

0.0 0.2 0.4 0.6 0.8 1.0pe

10-2

10-1

100

101

102tim

e (s

)

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 19/22

Page 31: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Depencency on probability of intercluster edges

0.03 0.04 0.05 0.06 0.07 0.08px

10-2

10-1

100

101

102

time

(s)

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 20/22

Page 32: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Heuristics

Performance of the heuristics on the 213 instances where weknow the optimal solution

time optimal avg. error max. error

Merge-based 6 0.4 s 124 0.9% 12.5%Move-based 6 0.4 s 55 4.9% 38.7%

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 21/22

Page 33: Evaluation of ILP-based Approaches for Partitioning into ...€¦ · Introduction Methods Experiments Evaluation of ILP-based Approaches for Partitioning into Colorful Components

Introduction Methods Experiments

Outlook

Model modifications:

more demands than just “connected” on cluster

allows constant number of duplicates per cluster

Algorithmic improvements:

cutting planes that take colors into account

column generation

S. Bruckner et al. (FU&TU Berlin) Evaluation of ILP-based Approaches for Partitioning into Colorful Components 22/22