one algorithm to rule them all one join query at a time atri rudra university at buffalo

Post on 18-Jan-2016

216 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

One algorithm to rule them allOne join query at a time

Atri RudraUniversity at Buffalo

A brief history of this talk

L2/L2 foreach sparse recovery/compressed sensing

http://www-stat.stanford.edu/~candes/stats330/index.shtml

The key technical problem

Given the three shadows, what is the largest size of the original set of points?

Given the three shadows, what is the largest size of the original set of points?

The key technical problem

Highly trivial: 43 = 64 Still trivial: 42 = 16 Correct answer: 41.5 = 8

The key technical problem

A

B

C

|R|= k

|T| =k|S|=k

k3/2

Loomis Whitney

Algorithmic Loomis-

Whitney?

Algorithmic Loomis-

Whitney?

An equivalent view

A

B

C

R

TS

A

B C

R

S

T

Output all (a,b,c) s.t. (a,b) in R, (b,c) in S and

(c,a) in T

Output all (a,b,c) s.t. (a,b) in R, (b,c) in S and

(c,a) in T

Overview of the talk

A

B C

R

S

T

The take-away message

Joinalgo

http://welovetumblr.blogspot.com/2012/07/thor-is.html

Overview of the talk

A

B C

R

S

T

(Database) Joins

Codd

Attributes/Nodes: [n]

Relations/Hyperedges: e1,…, em [n]

11

2233

44

55

Tables/Projections: R1 , … , Rm

Output all a = (a1,..,an) s.t. a projected down to

ei is in Ri for every i in [m]

Output all a = (a1,..,an) s.t. a projected down to

ei is in Ri for every i in [m]

The triangle join query

A

B

C

R

TS

Output all (a,b,c) s.t. (a,b) in R, (b,c) in S and

(c,a) in T

Output all (a,b,c) s.t. (a,b) in R, (b,c) in S and

(c,a) in T

S

AA

BB CC

R T

Bounding the output size

Atserias Grohe Marx

AA

BB CC

S

R T

Highly trivial bound: R S T

Still trivial bound: R S

Loomis-Whitney bound: R1/2 S1/2 T1/2

½

½

½x

y

z

AGM bound: Rx Sy Tz

x + z ≥ 1 x + y ≥ 1 y + z ≥ 1

AA

BB

CCx, y, z ≥ 0

Loomis Whitney

?

Algorithmic Loomis-WhitneyLoomis-Whitney bound: R1/2 S1/2 T1/2

AA

BB CC

S

R T½

½

½

R

TS CC

BBAA

c

Goal: Count number of trianglesGoal: Count number of triangles

There are Rchoices for edges in R

There are dS(c)dT(c)choices for pairs ofneighbors of c

http://agilitrix.com/2011/03/red-pill-blue-pill/

TS CC

BBAA

c

dT(c)dS(c)

Algorithmic Loomis-WhitneyLoomis-Whitney bound: R1/2 S1/2 T1/2

Goal: Count number of trianglesGoal: Count number of triangles

There are Rchoices for edges in R

There are dS(c)dT(c)choices for pairs ofneighbors of c

Make this choice for every c in CMake this choice for every c in C

Run time of algo=Σc min( R

,dS(c)dT(c) )

Run time of algo=Σc min( R

,dS(c)dT(c) )

R

TS CC

BBAA

c

Analyzing the algorithmLoomis Whitney bound: R½ S½ T½

Σc min( R , dS(c) dT(c) )

≤ Σc (R dS(c) dT(c) ) ½

= R½Σc ( dS(c) ½ dT(c) ½ )

≤ R½(Σc dS(c)) ½(ΣcdT(c)) ½

= R½S½T½

R

TS CC

BBAA

c

Cauchy Schwartz

min(E,F) ≤ (EF)½

min(E,F) ≤ (EF)½

?Atserias Grohe Marx

Same algorithm!AGM bound: Rx Sy Tz

Σc min( R , dS(c) dT(c) )

≤ Σc Rx (dS(c) dT(c) ) 1-x

≤ RxΣc ( dS(c) y dT(c) z )

≤ Rx(Σc dS(c)) y(ΣcdT(c)) z

= RxSyTz

R

TS CC

BBAA

c

x + z ≥ 1 x + y ≥ 1 y + z ≥ 1

AA

BB

CC

Hölder

min(E,F) ≤ ExF1-x

min(E,F) ≤ ExF1-x

General Join Result

Attributes/Nodes: [n]

Relations/Hyperedges: e1,…, em [n]

11

2233

44

55

Tables/Projections: R1 , … , Rm

x1,..,xm be a fractional cover

AGM bound: R1x1…Rm

xm

Our result: O(AGM + Input size)

x1

x2

x3

x4

Provably worst-case

optimal join algorithm

Provably worst-case

optimal join algorithm

List recovery

.

.

.

..

.

.

S1 S2 S3 Sn

………………………Si subset of [q]

………………………c1 c2 c3 cn

20

Code C subset of [q]nApplications in

expandersApplications in

expanders

An alternate view of joins

A

B C

R

S

T Msg in [q]3

Codeword in [q2]3

.

.

.

..

R S T

Constant dimensionConstant block length

Large alphabet sizeLarge input list size

Constant dimensionConstant block length

Large alphabet sizeLarge input list size

Overview of the talk

A

B C

R

S

T

Sparse Recovery/Compressed Sensing

UnknownTo be designed

Observed

DecodeDecode

Output

k=2

Heavy Hitter

Tail

Quantifying the approximation

L2 ≤ C L2

(Most of) rest of the talk

Designing the matrix

UnknownTo be designed

Observed

DecodeDecode

Output

k=2

Designing the matrix k=2

N

m

k-expander

N m

< ¼ (neighborhood)

Measurement = + noise

Heavy tail noise < ¼ (neighborhood)

> ½ of the neighbors of have the

“correct” value

> ½ of the neighbors of have the

“correct” value

Count-Sketch style algo k=2

N m

Estimate = median of O(log N) values

Output the top O(k) estimates

O(N log N) decoding

Indyk Ružić

We need a faster algorithm…

S

Towards a sub-linear time algo

Estimate=median value

Output the top O(k) estimates in S

O(|S| log N) decoding

All we need to do is to

compute a small S quikcly

All we need to do is to

compute a small S quikcly

Porat-Strauss Idea: Recursion!

[N]

{0,1}log N

[√N] [√N]

Solve in ~ √N time Solve in ~ √N time

The problem we now need to solveElements of S Geometrically…

k

k

?

Output size ~ k2Overall running time ~ √N + k2

Not sub-linear for

k > √N

Not sub-linear for

k > √N

Use a table-look up to decrease

the run time

Use a table-look up to decrease

the run time

Finally…

Slightly different recursionlog N

[N]

[N⅔] [N⅔] [N⅔]

Geometricproblem tosolve

Overall runtime

k3/2 + N2/3

Our Results

L2/L2 sparse recovery with failure prob p

Optimal k log(N/k) measurements*

k1+ε poly-log N decoding+space

p ~ (N/k)-k/poly-log k

Also prove tight lower bound of k log(N/k) + log(1/p)

One algorithm to rule them allOne join query at a time

Atri RudraUniversity at Buffalo

Only two problems so far…

A

B C

R

S

T

Albert Meyer (via Dick Lipton)

"Prove it for n=3 and then let 3 go to infinity"

The 3rd problem…

Big (hyper)graph G

http://pigeonsandplanes.com/2010/12/thoughts-on-net-neutrality.html

11

2233

44

55

Small (hyper) graph H

Compute all copies of H in G

Our join algorithm gives a worst-case optimal algorithm for any constant-sized H

Our join algorithm gives a worst-case optimal algorithm for any constant-sized H

Joins model many more

problems, e.g. CSPs

Joins model many more

problems, e.g. CSPs

The take-away message

Joinalgo

http://welovetumblr.blogspot.com/2012/07/thor-is.html

top related