# communication cost in big data processing - mmds mmds-data.org/presentations/2014/suciu_ ·...

Post on 25-Aug-2018

213 views

Embed Size (px)

TRANSCRIPT

Communication Cost in Big Data Processing

Dan Suciu University of Washington

Joint work with Paul Beame and Paris Koutris

and the Database Group at the UW

1

Queries on Big Data

Big Data processing on distributed clusters General programs: MapReduce, Spark SQL: PigLatin,, BigQuery, Shark, Myria

Traditional Data Processing Main cost = disk I/O Big Data Processing Main cost = communication

2

This Talk How much communication is needed to solve

a problem on p servers?

Problem: In this talk = Full Conjunctive Query CQ ( SQL) Future work = extend to linear algebra, ML, etc.

Some techniques discussed in this talk are implemented in Myria (Bill Howes talk)

3

Models of Communication

Combined communication/computation cost: PRAM MRC [Karloff2010] BSP [Valiant] LogP Model [Culler]

Explicit communication cost [Hong&Kung81] red/blue pebble games, for I/O

communication complexity [Ballard2012] extension to parallel algorithms

MapReduce model [DasSarma2013] MPC [Beame2013,Beame2014] this talk

4

Outline

The MPC Model

Communication Cost for Triangles

Communication Cost for CQs

Discussion

5

Massively Parallel Communication Model (MPC) Extends BSP [Valiant]

Server 1 Server p . . . .

Input (size=M)

Input data = size M bits

Number of servers = p

O(M/p) O(M/p)

6

Massively Parallel Communication Model (MPC) Extends BSP [Valiant]

Server 1 Server p . . . .

Server 1 Server p . . . .

Input (size=M)

Round 1

Input data = size M bits

One round = Compute & communicate

Number of servers = p

O(M/p) O(M/p)

7

Massively Parallel Communication Model (MPC) Extends BSP [Valiant]

Server 1 Server p . . . .

Server 1 Server p . . . .

Server 1 Server p . . . .

Input (size=M)

. . . .

Round 1

Round 2

Round 3

. . . .

Input data = size M bits

Algorithm = Several rounds

One round = Compute & communicate

Number of servers = p

O(M/p) O(M/p)

8

Massively Parallel Communication Model (MPC) Extends BSP [Valiant]

Server 1 Server p . . . .

Server 1 Server p . . . .

Server 1 Server p . . . .

Input (size=M)

. . . .

Round 1

Round 2

Round 3

. . . .

Input data = size M bits

Max communication load per server = L

Algorithm = Several rounds

One round = Compute & communicate

Number of servers = p L

L

L

L

L

L

O(M/p) O(M/p)

9

Massively Parallel Communication Model (MPC) Extends BSP [Valiant]

Server 1 Server p . . . .

Server 1 Server p . . . .

Server 1 Server p . . . .

Input (size=M)

. . . .

Round 1

Round 2

Round 3

. . . .

Input data = size M bits

Max communication load per server = L

Ideal: L = M/p This talk: L = M/p * p, where in [0,1] is called space exponent Degenerate: L = M (send everything to one server)

Algorithm = Several rounds

One round = Compute & communicate

Number of servers = p L

L

L

L

L

L

O(M/p) O(M/p)

10

Example: Q(x,y,z) = R(x,y) S(y,z)

Server 1 Server p . . . . M/p M/p Input:

R, S uniformly partitioned on p servers

size(R)+size(S)=M

11

Example: Q(x,y,z) = R(x,y) S(y,z)

Server 1 Server p . . . .

Server 1 Server p . . . .

Round 1 O(M/p) O(M/p)

M/p M/p

Round 1: each server Sends record R(x,y) to server h(y) mod p Sends record S(y,z) to server h(y) mod p

Input: R, S uniformly

partitioned on p servers

size(R)+size(S)=M

12

Example: Q(x,y,z) = R(x,y) S(y,z)

Server 1 Server p . . . .

Server 1 Server p . . . .

Round 1 O(M/p) O(M/p)

M/p M/p

R(x,y) S(y,z) R(x,y) S(y,z)

Output: Each server computes and outputs

the local join R(x,y) S(y,z)

Round 1: each server Sends record R(x,y) to server h(y) mod p Sends record S(y,z) to server h(y) mod p

Input: R, S uniformly

partitioned on p servers

size(R)+size(S)=M

13

Example: Q(x,y,z) = R(x,y) S(y,z)

Server 1 Server p . . . .

Server 1 Server p . . . .

Round 1 O(M/p) O(M/p)

M/p M/p

R(x,y) S(y,z) R(x,y) S(y,z)

Output: Each server computes and outputs

the local join R(x,y) S(y,z)

Round 1: each server Sends record R(x,y) to server h(y) mod p Sends record S(y,z) to server h(y) mod p

Input: R, S uniformly

partitioned on p servers

Assuming no Skew:y, degreeR(y), degreeS(y) M/p Rounds = 1 Load L = O(M/p) = O(M/p * p0)

size(R)+size(S)=M

Space exponent = 0

Example: Q(x,y,z) = R(x,y) S(y,z)

Server 1 Server p . . . .

Server 1 Server p . . . .

Round 1 O(M/p) O(M/p)

M/p M/p

R(x,y) S(y,z) R(x,y) S(y,z)

Output: Each server computes and outputs

the local join R(x,y) S(y,z)

Input: R, S uniformly

partitioned on p servers

Assuming no Skew:y, degreeR(y), degreeS(y) M/p Rounds = 1 Load L = O(M/p) = O(M/p * p0)

size(R)+size(S)=M

Space exponent = 0

SQL is embarrassingly

parallel!

Questions

Fix a query Q and a number of servers p

What is the communication load L needed to compute Q in one round? This talk based on [Beame2013, Beame2014]

Fix a maximum load L (space exponent ) How many rounds are needed for Q?Preliminary results in [Beame2013]

16

Outline

The MPC Model

Communication Cost for Triangles

Communication Cost for CQs

Discussion

17

The Triangle Query

Input: three tables R(X, Y), S(Y, Z), T(Z, X) size(R)+size(S)+size(T)=M

Output: compute Q(x,y,z) = R(x,y) S(y,z) T(z,x)

Z X

Fred Alice

Jack Jim

Fred Jim

Carol Alice

Y Z

Fred Alice

Jack Jim

Fred Jim

Carol Alice

X Y

Fred Alice

Jack Jim

Fred Jim

Carol Alice

R

S

T

18

Triangles in Two Rounds

Round 1: compute temp(X,Y,Z) = R(X,Y) S(Y,Z)

Round 2: compute Q(X,Y,Z) = temp(X,Y,Z) T(Z,X)

Load L depends on the size of temp. Can be much larger than M/p

Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(R)+size(S)+size(T)=M

Triangles in One Round

Factorize p = p p p Each server identified by (i,j,k)

i

j k

(i,j,k)

P

Place the servers in a cube: Server (i,j,k)

[Ganguli92, Afrati&Ullman10, Suri11, Beame13]

20

Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(R)+size(S)+size(T)=M

Triangles in One Round

k

(i,j,k)

P

Z X

Fred Alice

Jack Jim

Fred Jim

Carol Alice

Jack Jim Y Z

Fred Alice

Jack Jim

Fred Jim

Carol Alice

Jim Jack Jim Jack

X Y

Fred Alice

Jack Jim

Fred Jim

Carol Alice

R

S

T

i = h(Fred)

j = h(Jim)

Fred Jim Fred Jim

Fred Jim Fred Jim

Jim Jack

Jim Jack

Fred Jim Jim Jack

Jim Jack Fred Jim

Fred Jim

Jack Jim Jack Jim

Round 1: Send R(x,y) to all servers (h(x),h(y),*) Send S(y,z) to all servers (*, h(y), h(z)) Send T(z,x) to all servers (h(x), *, h(z))

Output: compute locally R(x,y) S(y,z) T(z,x)

21

Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(R)+size(S)+size(T)=M

Upper Bound is Lalgo = O(M/p)

22

Theorem Assume data has no skew: all degree O(M/p). Then the algorithm computes Q in one round, with communication load Lalgo = O(M/p)

Compare to two rounds: No more large intermediate result Skew-free up to degrees = O(M/p) (from O(M/p)) BUT: space exponent increased to = (from = 1)

Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X)

M = size(R)+size(S)+size(T) (in bits)

size(R)+size(S)+size(T)=M

Lower Bound is Llower = M / p

# without this assumption need to add a multiplicative factor 3 * Beame, Koutris, Suciu, Communication Steps for Parallel Query Processing, PODS 2013

M = size(R)+size(S)+size(T) (in bits)

23

Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(R)+size(S)+size(T)=M

Lower Bound is Llower = M / p

# without this assumption need to add a multiplicative factor 3 * Beame, Koutris, Suciu, Communication Steps for Parallel Query Processing, PODS 2013

M = size(R)+size(S)+size(T) (in bits)

24

Theorem* Any one-round deterministic algorithm A for Q requires Llower bits of communication per server

Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X)

Assume that the three input relations R, S, T are stored on disjoint servers#

size(R)+size(S)+size(T)=M

Lower Bound is Llower = M / p

M = size(R)+size(S)+size(T) (in bits)

25

Theorem* Any one-round deterministic algorithm A for Q requires Llower bits of communication per server

We prove a stronger claim, about any algorithm communicating < Llower bits Let L be the algorithms max communica

Recommended