communication cost in big data processing - mmds mmds-data.org/presentations/2014/suciu_  ·...

Download Communication Cost in Big Data Processing - mmds mmds-data.org/presentations/2014/suciu_  · Output:…

Post on 25-Aug-2018

213 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Communication Cost in Big Data Processing

    Dan Suciu University of Washington

    Joint work with Paul Beame and Paris Koutris

    and the Database Group at the UW

    1

  • Queries on Big Data

    Big Data processing on distributed clusters General programs: MapReduce, Spark SQL: PigLatin,, BigQuery, Shark, Myria

    Traditional Data Processing Main cost = disk I/O Big Data Processing Main cost = communication

    2

  • This Talk How much communication is needed to solve

    a problem on p servers?

    Problem: In this talk = Full Conjunctive Query CQ ( SQL) Future work = extend to linear algebra, ML, etc.

    Some techniques discussed in this talk are implemented in Myria (Bill Howes talk)

    3

  • Models of Communication

    Combined communication/computation cost: PRAM MRC [Karloff2010] BSP [Valiant] LogP Model [Culler]

    Explicit communication cost [Hong&Kung81] red/blue pebble games, for I/O

    communication complexity [Ballard2012] extension to parallel algorithms

    MapReduce model [DasSarma2013] MPC [Beame2013,Beame2014] this talk

    4

  • Outline

    The MPC Model

    Communication Cost for Triangles

    Communication Cost for CQs

    Discussion

    5

  • Massively Parallel Communication Model (MPC) Extends BSP [Valiant]

    Server 1 Server p . . . .

    Input (size=M)

    Input data = size M bits

    Number of servers = p

    O(M/p) O(M/p)

    6

  • Massively Parallel Communication Model (MPC) Extends BSP [Valiant]

    Server 1 Server p . . . .

    Server 1 Server p . . . .

    Input (size=M)

    Round 1

    Input data = size M bits

    One round = Compute & communicate

    Number of servers = p

    O(M/p) O(M/p)

    7

  • Massively Parallel Communication Model (MPC) Extends BSP [Valiant]

    Server 1 Server p . . . .

    Server 1 Server p . . . .

    Server 1 Server p . . . .

    Input (size=M)

    . . . .

    Round 1

    Round 2

    Round 3

    . . . .

    Input data = size M bits

    Algorithm = Several rounds

    One round = Compute & communicate

    Number of servers = p

    O(M/p) O(M/p)

    8

  • Massively Parallel Communication Model (MPC) Extends BSP [Valiant]

    Server 1 Server p . . . .

    Server 1 Server p . . . .

    Server 1 Server p . . . .

    Input (size=M)

    . . . .

    Round 1

    Round 2

    Round 3

    . . . .

    Input data = size M bits

    Max communication load per server = L

    Algorithm = Several rounds

    One round = Compute & communicate

    Number of servers = p L

    L

    L

    L

    L

    L

    O(M/p) O(M/p)

    9

  • Massively Parallel Communication Model (MPC) Extends BSP [Valiant]

    Server 1 Server p . . . .

    Server 1 Server p . . . .

    Server 1 Server p . . . .

    Input (size=M)

    . . . .

    Round 1

    Round 2

    Round 3

    . . . .

    Input data = size M bits

    Max communication load per server = L

    Ideal: L = M/p This talk: L = M/p * p, where in [0,1] is called space exponent Degenerate: L = M (send everything to one server)

    Algorithm = Several rounds

    One round = Compute & communicate

    Number of servers = p L

    L

    L

    L

    L

    L

    O(M/p) O(M/p)

    10

  • Example: Q(x,y,z) = R(x,y) S(y,z)

    Server 1 Server p . . . . M/p M/p Input:

    R, S uniformly partitioned on p servers

    size(R)+size(S)=M

    11

  • Example: Q(x,y,z) = R(x,y) S(y,z)

    Server 1 Server p . . . .

    Server 1 Server p . . . .

    Round 1 O(M/p) O(M/p)

    M/p M/p

    Round 1: each server Sends record R(x,y) to server h(y) mod p Sends record S(y,z) to server h(y) mod p

    Input: R, S uniformly

    partitioned on p servers

    size(R)+size(S)=M

    12

  • Example: Q(x,y,z) = R(x,y) S(y,z)

    Server 1 Server p . . . .

    Server 1 Server p . . . .

    Round 1 O(M/p) O(M/p)

    M/p M/p

    R(x,y) S(y,z) R(x,y) S(y,z)

    Output: Each server computes and outputs

    the local join R(x,y) S(y,z)

    Round 1: each server Sends record R(x,y) to server h(y) mod p Sends record S(y,z) to server h(y) mod p

    Input: R, S uniformly

    partitioned on p servers

    size(R)+size(S)=M

    13

  • Example: Q(x,y,z) = R(x,y) S(y,z)

    Server 1 Server p . . . .

    Server 1 Server p . . . .

    Round 1 O(M/p) O(M/p)

    M/p M/p

    R(x,y) S(y,z) R(x,y) S(y,z)

    Output: Each server computes and outputs

    the local join R(x,y) S(y,z)

    Round 1: each server Sends record R(x,y) to server h(y) mod p Sends record S(y,z) to server h(y) mod p

    Input: R, S uniformly

    partitioned on p servers

    Assuming no Skew:y, degreeR(y), degreeS(y) M/p Rounds = 1 Load L = O(M/p) = O(M/p * p0)

    size(R)+size(S)=M

    Space exponent = 0

  • Example: Q(x,y,z) = R(x,y) S(y,z)

    Server 1 Server p . . . .

    Server 1 Server p . . . .

    Round 1 O(M/p) O(M/p)

    M/p M/p

    R(x,y) S(y,z) R(x,y) S(y,z)

    Output: Each server computes and outputs

    the local join R(x,y) S(y,z)

    Round 1: each server Sends record R(x,y) to server h(y) mod p Sends record S(y,z) to server h(y) mod p

    Input: R, S uniformly

    partitioned on p servers

    Assuming no Skew:y, degreeR(y), degreeS(y) M/p Rounds = 1 Load L = O(M/p) = O(M/p * p0)

    size(R)+size(S)=M

    Space exponent = 0

    SQL is embarrassingly

    parallel!

  • Questions

    Fix a query Q and a number of servers p

    What is the communication load L needed to compute Q in one round? This talk based on [Beame2013, Beame2014]

    Fix a maximum load L (space exponent ) How many rounds are needed for Q?Preliminary results in [Beame2013]

    16

  • Outline

    The MPC Model

    Communication Cost for Triangles

    Communication Cost for CQs

    Discussion

    17

  • The Triangle Query

    Input: three tables R(X, Y), S(Y, Z), T(Z, X) size(R)+size(S)+size(T)=M

    Output: compute Q(x,y,z) = R(x,y) S(y,z) T(z,x)

    Z X

    Fred Alice

    Jack Jim

    Fred Jim

    Carol Alice

    Y Z

    Fred Alice

    Jack Jim

    Fred Jim

    Carol Alice

    X Y

    Fred Alice

    Jack Jim

    Fred Jim

    Carol Alice

    R

    S

    T

    18

  • Triangles in Two Rounds

    Round 1: compute temp(X,Y,Z) = R(X,Y) S(Y,Z)

    Round 2: compute Q(X,Y,Z) = temp(X,Y,Z) T(Z,X)

    Load L depends on the size of temp. Can be much larger than M/p

    Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(R)+size(S)+size(T)=M

  • Triangles in One Round

    Factorize p = p p p Each server identified by (i,j,k)

    i

    j k

    (i,j,k)

    P

    Place the servers in a cube: Server (i,j,k)

    [Ganguli92, Afrati&Ullman10, Suri11, Beame13]

    20

    Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(R)+size(S)+size(T)=M

  • Triangles in One Round

    k

    (i,j,k)

    P

    Z X

    Fred Alice

    Jack Jim

    Fred Jim

    Carol Alice

    Jack Jim Y Z

    Fred Alice

    Jack Jim

    Fred Jim

    Carol Alice

    Jim Jack Jim Jack

    X Y

    Fred Alice

    Jack Jim

    Fred Jim

    Carol Alice

    R

    S

    T

    i = h(Fred)

    j = h(Jim)

    Fred Jim Fred Jim

    Fred Jim Fred Jim

    Jim Jack

    Jim Jack

    Fred Jim Jim Jack

    Jim Jack Fred Jim

    Fred Jim

    Jack Jim Jack Jim

    Round 1: Send R(x,y) to all servers (h(x),h(y),*) Send S(y,z) to all servers (*, h(y), h(z)) Send T(z,x) to all servers (h(x), *, h(z))

    Output: compute locally R(x,y) S(y,z) T(z,x)

    21

    Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(R)+size(S)+size(T)=M

  • Upper Bound is Lalgo = O(M/p)

    22

    Theorem Assume data has no skew: all degree O(M/p). Then the algorithm computes Q in one round, with communication load Lalgo = O(M/p)

    Compare to two rounds: No more large intermediate result Skew-free up to degrees = O(M/p) (from O(M/p)) BUT: space exponent increased to = (from = 1)

    Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X)

    M = size(R)+size(S)+size(T) (in bits)

    size(R)+size(S)+size(T)=M

  • Lower Bound is Llower = M / p

    # without this assumption need to add a multiplicative factor 3 * Beame, Koutris, Suciu, Communication Steps for Parallel Query Processing, PODS 2013

    M = size(R)+size(S)+size(T) (in bits)

    23

    Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(R)+size(S)+size(T)=M

  • Lower Bound is Llower = M / p

    # without this assumption need to add a multiplicative factor 3 * Beame, Koutris, Suciu, Communication Steps for Parallel Query Processing, PODS 2013

    M = size(R)+size(S)+size(T) (in bits)

    24

    Theorem* Any one-round deterministic algorithm A for Q requires Llower bits of communication per server

    Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X)

    Assume that the three input relations R, S, T are stored on disjoint servers#

    size(R)+size(S)+size(T)=M

  • Lower Bound is Llower = M / p

    M = size(R)+size(S)+size(T) (in bits)

    25

    Theorem* Any one-round deterministic algorithm A for Q requires Llower bits of communication per server

    We prove a stronger claim, about any algorithm communicating < Llower bits Let L be the algorithms max communica

Recommended

View more >