the 2-catalog
DESCRIPTION
The 2-Catalog. Segmentation. Problem. Joint work with Shmuel Safra. Motivation. Motivation. The Catalog Problem. Input: A set of customers C . A set of pages P . A function : C 2 P . The catalog size r . - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/1.jpg)
1
Joint work with Shmuel Safra
Joint work with Shmuel Safra
![Page 2: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/2.jpg)
2
MotivationMotivation
![Page 3: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/3.jpg)
3
MotivationMotivation
![Page 4: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/4.jpg)
4
The Catalog ProblemThe Catalog ProblemInput: A set of customers C. A set of pages P. A function : C 2P. The catalog size r.
Output: A catalog P’ P of size r s.t. is maximal.
Cc'Pc
![Page 5: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/5.jpg)
5
The Catalog Problem The Catalog Problem (cont.)(cont.)Algorithm:Take the r most popular pages.
![Page 6: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/6.jpg)
6
Catalog SegmentationCatalog Segmentation
![Page 7: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/7.jpg)
7
The k-Catalog The k-Catalog SegmentationSegmentationInput: A set of customers C. A set of pages P. A function : C 2P. The catalog size r.
Output: k catalogs P1,…,Pk P of size r each,
s.t. is maximal.
Cc
iki
Pcmax
![Page 8: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/8.jpg)
8
Representation as a Representation as a GraphGraph We can consider the input as a bipartite
graph G = (C, P, E), whereE = { (c,p) | c C, p (c) }.
Then, our goal is to find k sets of vertices P1,…Pk P of size r each, and a partition of C into k sets C1,…,Ck s.t.| E ( P1C1 … Pk Ck) | is maximal.
![Page 9: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/9.jpg)
9
Uniform Catalog ProblemUniform Catalog ProblemDefinition: A catalog problem is called
uniform if there exists a number d such that the degree of every vertex p P is d.
The maximum possible number of hits for a uniform catalog problem is krd.
Thus, we can normalize the number of hits and define
drkPC...PCE kk11maxGsat
![Page 10: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/10.jpg)
10
HardnessHardnessTheorem (Kleinberg, Papadimitriou and
Raghavan): It is NP-hard to precisely
compute the optimal k catalogs.
![Page 11: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/11.jpg)
11
ApproximationApproximationProposition: Taking the r most popular
pages in all k catalogs gives an approximation factor of 1/k.
Proof: In the optimal solution, there is a catalog that gives at least 1/k of the hits. Thus, using only this catalog leaves us with at least 1/k of the hits. Replacing this catalog by the r most popular pages can only increase the number of hits.
![Page 12: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/12.jpg)
12
Dense InstancesDense InstancesKleinberg, Papadimitriou and Raghavan
gave an approximation scheme for dense instances, i.e. instances in which each customer is interested in at least fraction of the pages.
![Page 13: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/13.jpg)
13
The PCPThe PCP A SAT instance = (1,…,n) over 2
types of variables: X and Y. The range of the variables x X is
RX = {0,1}l. The range of the variables y Y is {0,1}. Each i depends on exactly one x
X and one y Y, s.t the value assigned to x determines the value of y. Thus, we can write it as a function xy : Rx {0,1}.
![Page 14: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/14.jpg)
14
The PCP (cont.)The PCP (cont.)It is NP-hard to distinguish between the
following 2 cases:
Good: There exists an assignment A s.t.
Bad: For any assignment A
1yAxAPr yxyx
21
yx yAxAPryx
![Page 15: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/15.jpg)
15
The ReductionThe ReductionGiven an instance for the above PCP, let
G be the following instance for the 2-catalog segmentation problem:
P = { (x, a, s) | x X, a RX, s {0,1} } C = { (y, b) | y Y, b {0,1} } (x, a, s) (y, b)
xy and xy(a) = b s r = |X|
![Page 16: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/16.jpg)
16
CompletenessCompletenessTheorem: If is satisfiable then sat(G) =
1.
Proof: Consider the following segmentation: i {0,1}, Pi = { (x, A(x), i) | x X}. y Y, (y, A(y)) gets P0 and (y, A(y))
gets P1.Thus, for every page in the catalogs, all the
customers that are interested in it get it, and hence sat(G) = 1.
![Page 17: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/17.jpg)
17
We would like to show that: , = (), = () s.t. if sat(G) > ½ + then there exists an assignment A s.t.
.
We would like to construct an assignment according to the catalogs.
SoundnessSoundness
21
yx yAxAPryx
Problem: A catalog might contain many pages for the same x with different assignments.
![Page 18: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/18.jpg)
18
Refining the PCPRefining the PCPSolution: Changing the PCP.
Good: There exists an assignment A s.t.
Bad: For any assignment A
1yAxAPr yxyx
21
yx yAxAPryx
21
yxXx
yAxAPrPryx
![Page 19: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/19.jpg)
19
Choosing One CatalogChoosing One CatalogNow, assume sat(G) > ½ + . Thus, for
one of the catalogs, Pi’,
and hence
222
1'icp:cPp
CcPrPr'i
21
'icp:c,PpCcPr
'i
![Page 20: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/20.jpg)
20
Choosing a Subset of Choosing a Subset of PagesPages Let .
Thus, |Pi’’| /2 |X|.
Now, let us keep only one page in Pi’’ for each x X, and denote the set by Pi’’’.|Pi’’’| 2-l /2 |X|.
221
'icp:c'i'i CcPr|Pp'P
![Page 21: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/21.jpg)
21
Enforcing the Same sEnforcing the Same s s’ {0,1} s.t.
|{ (x, a, s’) | (x, a, s’) Pi’’’ }| 2-l+1 /2 |X|.
Denote the set of the corresponding x’s by X’.
For an appropriate value of , |X’| |X|.
![Page 22: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/22.jpg)
22
Constructing an Constructing an AssignmentAssignmentWe would like to construct an assignment
as follows: x X’, assign the value of the
appropriate page. y Y, if (y, b) gets the catalog Pi’,
assign the value b s’ to y.
Thus, x X’, ½ + /2 of the clauses xy are satisfied.
![Page 23: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/23.jpg)
23
ProblemProblemFor a variable y Y, both (y, 0) and (y, 1)
might get the same catalog. Thus, we cannot obtain an assignment to Y as we would like to.
![Page 24: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/24.jpg)
24
ProblemProblemFor a variable y Y, both (y, 0) and (y, 1)
might get the same catalog. Thus, we cannot obtain an assignment to Y as we would like to.
![Page 25: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/25.jpg)
25
Taking Subsets of x’sTaking Subsets of x’sInstead of taking one page for each (x, a,
s), we take a page for every tuple of: A subset of m x’s An assignment to A bit s
x
xA x
![Page 26: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/26.jpg)
26
The PCPThe PCP = (1,…,n) over variables, X and Y, s.t.
it is NP-hard to distinguish between:
Good: There exists an assignment A s.t.
Bad: For any assignment A
1yAxAPr yxyx
21
yxXx
yAxAPrPryx
![Page 27: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/27.jpg)
27
par[par[,k] - Definitions,k] - Definitions For a 3SAT formula over boolean
variables Y, let Y(k) be the set of allk-subset of Y, and let (k) be the set of all k- subset of .
VY(k), let SV be the set of all assignments to V.
C(k), let SC be the set of all satisfying assignments to C.
![Page 28: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/28.jpg)
28
par[par[,k] – Definitions ,k] – Definitions (cont.)(cont.) VY(k), C(k), let V C if V is a choice
of one variable of each clause in C.
VY(k), C(k), s.t. V C let a|V denote the natural restriction of an a SC to SV.
![Page 29: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/29.jpg)
29
par[par[,k] ,k] Definition: For a 3SAT formula over
boolean variables Y, denote by par[,k] the following instance:
There are 2 types of variables: W : x[V] for every V Y(k), over SV
Z : x[C] for every C (k), over SC
There is a local test [C,V] for everyV C that accepts x[C]|v = x[V].
![Page 30: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/30.jpg)
30
par[par[,k] (cont.),k] (cont.)Definition: For a set of boolean clauses ,
let sat() denote the maximal fraction of clauses of that can be satisfied simultaneously.
Theorem: If sat() = 1 then sat(par[,k]) = 1. sat(par[, k]) sat()c·k for some c>0.
![Page 31: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/31.jpg)
31
Long CodeLong CodeDefinition: An R-long-code has one bit for
each boolean f : [R] {0,1}.
![Page 32: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/32.jpg)
32
The PCP of [ST]The PCP of [ST]For any bipartite graph G = ([k], [k], E) we
construct a SAT instance (G), that contains one boolean function for every choice of:
z Z v1,…vk LC[z] w1,…,wk W, s.t. 1 i k, wi z 1 i k, ui wi
k2 perturbation functions p1,1,…,pk,k
![Page 33: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/33.jpg)
33
The PCP of [ST] (cont.)The PCP of [ST] (cont.) (v1,…,vk,u1,…,uk,p1,1,…,pk,k) = TRUE
(i,j)E, vi uj = ‘vi uj pi,j’.
Denote TRUEp,...,p,u,...,u,v,...,vPrp k,k1,1k1k1
p,u,v t,sji
![Page 34: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/34.jpg)
34
The PCP of [ST] (cont.)The PCP of [ST] (cont.)Theorem: > 0, it is NP-hard to
distinguish between the following 2 cases:
Good: G = ([k], [k], E), p > (1 - )-|E|
Bad: G = ([k], [k], E), p < 2-|E|
![Page 35: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/35.jpg)
35
Our PCPOur PCP A SAT instance = (1,…,n) over 2
types of variables: X and Y. The range of the variables x X is
RX = {0,1}l. The range of the variables y Y is
{0,1}. Each i is of the type xy : Rx
{0,1}.
![Page 36: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/36.jpg)
36
Our PCP (cont.)Our PCP (cont.) Let k = l/2. Given an instance (G) as above, we
construct an instance as follows: There is a variable x X for every
test (G). An assignment to x is an assignment to the bits v1,…,vk,u1,…,uk.
Y = LC[W].
![Page 37: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/37.jpg)
37
Our PCP (cont.)Our PCP (cont.)Theorem: , > 0 and for some
constant c = c( ) > 0, it is NP-hard to distinguish between:
Good: There exists an assignment A s.t.
Bad: For any assignment A
1yAxAPr yxyx
21
yxXx
yAxAPrPryx
2cl2
![Page 38: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/38.jpg)
38
Our PCP (cont.)Our PCP (cont.)Lemma: If there exists an assignment A
s.t.
,
then, there exists a graph G = (V, U, E) and an assignment to LC[W] and LC[Z] s.t.p 2-|E|.
21
yxXx
yAxAPrPryx
![Page 39: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/39.jpg)
39
Our PCP (cont.)Our PCP (cont.)Proof: Assume there exists an assignment
A s.t.
.
We assign the bits of LC[W] the values assigned to them by A, and the bits of LC[Z] are assigned random values.
21
yxXx
yAxAPrPryx
![Page 40: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/40.jpg)
40
Our PCP (cont.)Our PCP (cont.)We now have to construct a graph G that
would satisfy the lemma.
We call an x good if .
Let x be good and let V0, U0 be the corresponding vertices.
21
yx yAxAPryx
![Page 41: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/41.jpg)
41
Our PCP (cont.)Our PCP (cont.)V0 U0
V1 U1
U2
The set of vertices in V0 for which at least½ + /2 of their edges are consistent with x.
|V1| /2 k
The set of vertices in U0 that are consistent with x.
U0 \ U1
![Page 42: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/42.jpg)
42
Our PCP (cont.)Our PCP (cont.)Proposition: There exists i {1,2} s.t.
|Ui| /4 k, and at least ½ + /4 of the edges between Ui and V1 are consistent with x.
![Page 43: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/43.jpg)
43
Our PCP (cont.)Our PCP (cont.)The set of vertices in V0 for which at least½ + /2 of their edges are consistent with x.
|V1| /2 k
The set of vertices in U0 that are consistent with x.
U0 \ U1
V1 U1
V’
U’
![Page 44: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/44.jpg)
44
Our PCP (cont.)Our PCP (cont.)V1 U1
V1
U1
U2
The set of vertices in V0 for which at least½ + /2 of their edges are consistent with x.
|V1| /2 k
The set of vertices in U0 that are consistent with x.
U0 \ U1
![Page 45: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/45.jpg)
45
Our PCP (cont.)Our PCP (cont.) Let U’ Ui, V’ V1, s.t. |U’| = |V’| = /4
k, and at least ½ + /4 of the edges between U’ and V’ are consistent with x.
There are less than 22k possibilities to choose U’ and V’ there is a subset X’ of at least 2-2k (and thus of size at least2-2k |X|) of the good x’s with the same choice of U’ and V’.
![Page 46: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/46.jpg)
46
Our PCP (cont.)Our PCP (cont.) Let X’’ be the subset of variables x X’
that are consistent with the random assignment to LC[Z].
The probability that A(x) is consistent with a random assignment to LC[Z] is 2-k
the expected size of X’’ is 2-k |X’|.
Therefore, there exists an assignment to LC[Z] s.t. |X’’| 2-3k |X|.
![Page 47: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/47.jpg)
47
Our PCP (cont.)Our PCP (cont.) Let G be the multi-set of all graphs
G = (V’, U’, E), corresponding to the variables x X’’, where E is the set of all edges between U’ and V’ that are consistent with x.
|G| 2-3k |X|.
GG, |E| (½ + /4) (/4 k)2.
![Page 48: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/48.jpg)
48
Our PCP (cont.)Our PCP (cont.)Lemma: Let G be a multi-set of bipartite
graphs on [k’][k’], s.t. each graph in G has at least (½ + ’)k’2 edges.Then, t ’/2 k’2, G = ([k’], [k’], E), s.t. |E| t and
. t2
'1
'E,'k,'k'GE'EPr
G
![Page 49: The 2-Catalog](https://reader035.vdocuments.us/reader035/viewer/2022062408/568138af550346895da06f47/html5/thumbnails/49.jpg)
49
Our PCP (cont.)Our PCP (cont.)By the above lemma, for k’ = /4 k and
’ = /2, G = ([/4 k], [/4 k], E), s.t.|E| = t = c’ (/4 k)2, where c’ < /4, and all the edges of this graph are consistent in at least 2-3k (/4)t fraction of the variables in X.
Considering this graph over the vertex sets U and V gives the desired result.