efficient and effective practical algorithms for the set-covering problem qi yang, jamie mcpeek,...

29
Efficient and Effective Practical Algorithms for the Set-Covering Problem Qi Yang, Jamie McPeek, Adam Nofsinger Department of Computer Science and Software Engineering University of Wisconsin at Platteville

Upload: eustace-preston

Post on 26-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Efficient and Effective Practical Algorithms forthe Set-Covering Problem

Qi Yang, Jamie McPeek, Adam Nofsinger

Department of Computer Science and Software Engineering

University of Wisconsin at Platteville

The Set-Covering Problem

Given N sets, let X be the union of all the sets. A cover of X is a group of sets from the N sets

such that every element of X belongs to a set in the group.

The set-covering problem is to find a cover of X of the minimum size.

Matrix Representation of the Set-covering Problem

a b c d e f

S1 0 1 1 0 1 0

S2 0 0 1 1 0 0

S3 1 1 0 1 0 1

S4 0 1 0 0 1 1

Number of sets: N = 4

Number of elements: M = 6

One cover: S1, S3, S4

One minimal cover: S1, S3

Not a cover: S1, S2, S4 (a is not covered)

NP-Hard Problem

Introduction to Algorithms by T. H. Cormen, C.E. Leiserson, R. L. Rivest

The Set-covering problem has been proved to be NP hard

A Greedy Algorithm

Algorithm Greedy

ResultCover : The minimum cover to be found.Uncovered : The set of elements not covered yet.

1. Set ResultCover to the empty set2. Set Uncovered to the union of all sets3. While Uncovered is not empty

a. select a set S that is not in ResultCover and covers the most elements of Uncovered

b. add S to ResultCoverc. remove all elements of S from Uncovered

Algorithm Check And Remove (CAR)

Identifying Redundant Search Engines in a Very Large Scale Metasearch Engine Context

8th ACM International Workshop on Web Information and Data Management

The set-covering problem is equivalent to the problem of identifying redundant search engines on the Web

Algorithm CAR is much faster than Algorithm Greedy

Algorithm CAR (Check And Remove)

1. Set ResultCover to the empty set

2. For each set Sa. determine if S has an element that is not covered by ResultCoverb. add S to ResultCover if S has such an elementc. exit the for loop if ResultCover is a cover of X

3. For each set S in ResultCovera. determine if S has an element that is not covered by any other set of

ResultCoverb. Remove S from ResultCover if S has no such an element

Example

a b c d e f

S1 0 1 1 0 1 0

S2 0 0 1 1 0 0

S3 1 1 0 1 0 1

S4 0 1 0 0 1 1

Set ResultCover UnCovered

{} {a, b, c, d, e, f}

S1 {S1} {a, d, f}

S2 {S1, S2} {a, f}

S3 {S1, S2, S3} {}

Removing S2

{S1, S3} {}

Time Complexity

Algorithm Greedy O(M * N * min(M, N))

Algorithm CAR O(M * N)

N: number of setsM: number of elements of the union X

CPU Time

CPU Time

05000100001500020000

25000300003500040000

100

200

300

400

500

600

700

800

900

1000

Actual Cover Size

CPU Time (Sec)

Greedy

CAR

Cover Size

Cover Sizes of the Two Algorithms

Actual 100 300 500 700 900

Greedy 105 300 501 700 900

CAR 485 300 500 700 900

Implementation Details

Read data

Binary search tree

BitMap indicating which sets cover an element Convert the tree to an array of BitMaps

Matrix representation of the set-cover problem Find a cover

Binary Search Tree and BitMap

element

element

element

Number of sets (N) is knownNumber of elements of each set is knownThe total number of elements is unknown

Reading elements of one set at a time

BitMap size N which sets cover the element a column of the matrix

Array of Column BitMaps

Row Operations• Find the number of elements in a set that are not covered by the result cover • Determine if a set contains an element that is not covered by the result cover• Determine if a set in the result cover has an element that is not covered by any other sets in result cover• …

. . . . . . . . . .

e1 e2 e3 e4 em-1 em

Array of Row BitMaps

It takes some time to convert column BitMaps to row BitMaps.

But all row operations are performed within a row BitMap.

CPU Time

Running Times (seconds) of the Greedy Algorithm

Col 0.63 53.9 300 1220 2130 3457 5056

Row 0.28 7.6 41 161 274 437 629

Running Times (seconds) of the CAR algorithm

Col 0.01 0.31 1.63 6.36 11.15 16.92 20.70

Row 0.09 0.39 0.96 2.12 3.27 4.34 5.46

The CPU time includes the time to convert column BitMaps to row BitMaps, but not the time to build the tree.

CPU Time (Row BitMap)

Running Times (seconds) of the Two algorithms

Greed 0.28 7.6 41 161 274 437 629

CAR 0.09 0.39 0.96 2.12 3.27 4.34 5.46

Algorithm Greedy

1. Set ResultCover to the empty set2. Set Uncovered to the union of all sets3. While Uncovered is not empty

a. select a set S that is not in ResultCover and covers the most elements of Uncovered

b. add S to ResultCoverc. remove all elements of S from Uncovered

Algorithm Greedy Update

UncoveredCount: number of elements of a set not covered by ResultCover

1. Set ResultCover to the empty set

2. Set Uncovered to the union of all sets

3. For each set, set the UncoveredCount to the size of the set

4. While Uncovered is not empty

a. select a set that has the largest value of UncoveredCount among all sets not in ResultCover

b. add the set to ResultCover

c. remove all elements of the set from Uncovered

d. update the value of UncoveredCount for each set not in ResultCover

Update Uncovered Count

For each element in the set to be added to the ResultCover

If the result cover does not covers it

For each set not in the result cover

If the set contains the element

uncovered count is decremented by one

Time Complexity

Algorithm Greedy O(M * N * min(M, N))

Algorithm CAR O(M * N)

Algorithm Greedy Update O(M * N)

CPU Time

Running Times (seconds) of the Two algorithms

Update 0.15 0.92 2.26 5.13 7.31 10.1 13.1

CAR 0.09 0.39 0.96 2.12 3.27 4.34 5.46

Algorithm List And Remove (LAR)

Implemented the matrix using linked list instead of array of BitMaps

Algorithm Update plus the remove phase from algorithm CAR

Linked List for Matrix

e1 e2 e3 e4 e5 e6 e7

S5

S4

S3

S2

S1

CPU Time

Running Times (seconds) of the Two algorithms

LAR 0.21 0.35 0.51 0.86 1.11 1.40 1.66

CAR 0.26 0.49 0.65 1.01 1.24 1.46 1.67

Cover Size

Cover Sizes of the Two algorithms

LAR 10 87 191 422 607 815 971

CAR 16 120 235 467 648 824 975

Cover Size (Different Data Sets)

Cover Sizes of the Two algorithms

Actual 50 70 90 110 200 500 900

LAR 50 70 90 110 200 500 900

CAR 291 391 496 528 200 500 900

Summary

Algorithm LAR runs faster than Algorithm CAR Algorithm LAR generates smaller cover sets than

Algorithm CAR Algorithm: Updating vs. searching every time Data Structure: Link list vs. array of BitMaps

Questions?