efficient and effective practical algorithms for the set-covering problem qi yang, jamie mcpeek,...
TRANSCRIPT
Efficient and Effective Practical Algorithms forthe Set-Covering Problem
Qi Yang, Jamie McPeek, Adam Nofsinger
Department of Computer Science and Software Engineering
University of Wisconsin at Platteville
The Set-Covering Problem
Given N sets, let X be the union of all the sets. A cover of X is a group of sets from the N sets
such that every element of X belongs to a set in the group.
The set-covering problem is to find a cover of X of the minimum size.
Matrix Representation of the Set-covering Problem
a b c d e f
S1 0 1 1 0 1 0
S2 0 0 1 1 0 0
S3 1 1 0 1 0 1
S4 0 1 0 0 1 1
Number of sets: N = 4
Number of elements: M = 6
One cover: S1, S3, S4
One minimal cover: S1, S3
Not a cover: S1, S2, S4 (a is not covered)
NP-Hard Problem
Introduction to Algorithms by T. H. Cormen, C.E. Leiserson, R. L. Rivest
The Set-covering problem has been proved to be NP hard
A Greedy Algorithm
Algorithm Greedy
ResultCover : The minimum cover to be found.Uncovered : The set of elements not covered yet.
1. Set ResultCover to the empty set2. Set Uncovered to the union of all sets3. While Uncovered is not empty
a. select a set S that is not in ResultCover and covers the most elements of Uncovered
b. add S to ResultCoverc. remove all elements of S from Uncovered
Algorithm Check And Remove (CAR)
Identifying Redundant Search Engines in a Very Large Scale Metasearch Engine Context
8th ACM International Workshop on Web Information and Data Management
The set-covering problem is equivalent to the problem of identifying redundant search engines on the Web
Algorithm CAR is much faster than Algorithm Greedy
Algorithm CAR (Check And Remove)
1. Set ResultCover to the empty set
2. For each set Sa. determine if S has an element that is not covered by ResultCoverb. add S to ResultCover if S has such an elementc. exit the for loop if ResultCover is a cover of X
3. For each set S in ResultCovera. determine if S has an element that is not covered by any other set of
ResultCoverb. Remove S from ResultCover if S has no such an element
Example
a b c d e f
S1 0 1 1 0 1 0
S2 0 0 1 1 0 0
S3 1 1 0 1 0 1
S4 0 1 0 0 1 1
Set ResultCover UnCovered
{} {a, b, c, d, e, f}
S1 {S1} {a, d, f}
S2 {S1, S2} {a, f}
S3 {S1, S2, S3} {}
Removing S2
{S1, S3} {}
Time Complexity
Algorithm Greedy O(M * N * min(M, N))
Algorithm CAR O(M * N)
N: number of setsM: number of elements of the union X
CPU Time
CPU Time
05000100001500020000
25000300003500040000
100
200
300
400
500
600
700
800
900
1000
Actual Cover Size
CPU Time (Sec)
Greedy
CAR
Cover Size
Cover Sizes of the Two Algorithms
Actual 100 300 500 700 900
Greedy 105 300 501 700 900
CAR 485 300 500 700 900
Implementation Details
Read data
Binary search tree
BitMap indicating which sets cover an element Convert the tree to an array of BitMaps
Matrix representation of the set-cover problem Find a cover
Binary Search Tree and BitMap
element
element
element
Number of sets (N) is knownNumber of elements of each set is knownThe total number of elements is unknown
Reading elements of one set at a time
BitMap size N which sets cover the element a column of the matrix
Array of Column BitMaps
Row Operations• Find the number of elements in a set that are not covered by the result cover • Determine if a set contains an element that is not covered by the result cover• Determine if a set in the result cover has an element that is not covered by any other sets in result cover• …
. . . . . . . . . .
e1 e2 e3 e4 em-1 em
Array of Row BitMaps
It takes some time to convert column BitMaps to row BitMaps.
But all row operations are performed within a row BitMap.
CPU Time
Running Times (seconds) of the Greedy Algorithm
Col 0.63 53.9 300 1220 2130 3457 5056
Row 0.28 7.6 41 161 274 437 629
Running Times (seconds) of the CAR algorithm
Col 0.01 0.31 1.63 6.36 11.15 16.92 20.70
Row 0.09 0.39 0.96 2.12 3.27 4.34 5.46
The CPU time includes the time to convert column BitMaps to row BitMaps, but not the time to build the tree.
CPU Time (Row BitMap)
Running Times (seconds) of the Two algorithms
Greed 0.28 7.6 41 161 274 437 629
CAR 0.09 0.39 0.96 2.12 3.27 4.34 5.46
Algorithm Greedy
1. Set ResultCover to the empty set2. Set Uncovered to the union of all sets3. While Uncovered is not empty
a. select a set S that is not in ResultCover and covers the most elements of Uncovered
b. add S to ResultCoverc. remove all elements of S from Uncovered
Algorithm Greedy Update
UncoveredCount: number of elements of a set not covered by ResultCover
1. Set ResultCover to the empty set
2. Set Uncovered to the union of all sets
3. For each set, set the UncoveredCount to the size of the set
4. While Uncovered is not empty
a. select a set that has the largest value of UncoveredCount among all sets not in ResultCover
b. add the set to ResultCover
c. remove all elements of the set from Uncovered
d. update the value of UncoveredCount for each set not in ResultCover
Update Uncovered Count
For each element in the set to be added to the ResultCover
If the result cover does not covers it
For each set not in the result cover
If the set contains the element
uncovered count is decremented by one
Time Complexity
Algorithm Greedy O(M * N * min(M, N))
Algorithm CAR O(M * N)
Algorithm Greedy Update O(M * N)
CPU Time
Running Times (seconds) of the Two algorithms
Update 0.15 0.92 2.26 5.13 7.31 10.1 13.1
CAR 0.09 0.39 0.96 2.12 3.27 4.34 5.46
Algorithm List And Remove (LAR)
Implemented the matrix using linked list instead of array of BitMaps
Algorithm Update plus the remove phase from algorithm CAR
CPU Time
Running Times (seconds) of the Two algorithms
LAR 0.21 0.35 0.51 0.86 1.11 1.40 1.66
CAR 0.26 0.49 0.65 1.01 1.24 1.46 1.67
Cover Size
Cover Sizes of the Two algorithms
LAR 10 87 191 422 607 815 971
CAR 16 120 235 467 648 824 975
Cover Size (Different Data Sets)
Cover Sizes of the Two algorithms
Actual 50 70 90 110 200 500 900
LAR 50 70 90 110 200 500 900
CAR 291 391 496 528 200 500 900
Summary
Algorithm LAR runs faster than Algorithm CAR Algorithm LAR generates smaller cover sets than
Algorithm CAR Algorithm: Updating vs. searching every time Data Structure: Link list vs. array of BitMaps