continuous distributed counting for non-monotonic streams
DESCRIPTION
Continuous Distributed Counting for Non-monotonic Streams. Milan Vojnović Microsoft Research Joint work with Zhenming Liu and Božidar Radunović. January 14, 2012. SUM Tracking Problem. : . Maintain estimate . k. 1. 2. 3. SUM:. SUM Tracking: Applications. - PowerPoint PPT PresentationTRANSCRIPT
Continuous Distributed Counting for Non-monotonic Streams
Milan VojnovićMicrosoft Research
Joint work with Zhenming Liu and Božidar Radunović
January 14, 2012
2
SUM Tracking ProblemMaintain estimate
1 2 3 k
SUM:
:
𝑋 1𝑋 2𝑋 3 𝑋 4𝑋 5
3
SUM Tracking: Applications
• Ex 1: database queries
SELECT SUM(AdBids)from Ads
• Ex 2: iterative solving
input data
4
Related Work
• Count tracking [Huang, Yi and Zhang, 2011]
– Worst-case input, monotonic sum– Expected communication cost, for :
and lower bound
• Lower bound for worst case input [Arackaparambil, Brody and Chakrabarti, 2009] – Expected communication cost:
5
Questions
• Worst case complexity
Ex. +1, -1, +1, …
• Complexity under random input?– Random permutation– Random i.i.d.– Fractional Brownian motion
6
Outline
• Upper bounds
• Lower bounds
• Applications
7
Our Tracker Algorithm
• Sampling based algorithm: upon arrival of update , send a message to the coordinator w. p.
• If any site sends a message: sync all
XiMi = 1Sk
S1S
S site
site
coordinator
S, Sk
S, S1S = S1 + … + Sk
8
Algorithm’s Modes
Sample
1/(𝜇2𝜖)
Sample
Use two monotonic countersAlways report√𝑘 /𝜖
−√𝑘/𝜖
9
Communication Cost Upper BoundSingle Site
• Input: i.i.d. Bernoulli • Sampling probability with and
Expected communication cost:
10
Proof Key Idea
𝑆𝑡=𝑠
𝑠1+𝜖
𝑠1−𝜖
message sent
11
Communication Cost Upper BoundMultiple Sites
• sites• Updates i.i.d. Bernoulli • large enough and
Expected communication cost:
12
Communication Cost Upper BoundUnknown Drift Case
• Input: i.i.d. Bernoulli
• : unknown drift parameter
Expected communication cost:
13
Communication Cost Upper BoundRandom Permutation Input
• Input: a random permutation of values • sufficiently large and
Expected communication cost:
14
Communication Cost Upper BoundFractional Brownian Motion
• Input: a fractional Brownian motion with Hurst parameter
Expected communication cost:
15
Outline
• Upper bounds
• Lower bounds
• Applications
16
−1 /𝜖1/𝜖
Lower BoundsSingle Site, Zero Drift
• Input: i.i.d. Bernoulli
Expected communication cost:
17
Lower BoundsMultiple Sites
• Input: i.i.d. Bernoulli or a random permutation
Expected communication cost:
18
Proof Key Ideas
• Under , maximum deviation
0 𝑛𝑘 2𝑘⋯ ⋯𝑗𝑘 ( 𝑗+1)𝑘
𝑗𝑘 ( 𝑗+1)𝑘𝑗𝑘+1𝑗𝑘+2
1 2 k⋯
19
Proof Key Ideas (cont’d)
• Query: or ?
• Answer: incorrect only if and the answer is under or under
• Lemma: messages is necessary to answer the query correctly with a constant positive probability
1 2 k⋯
𝑋 1𝑋 2 𝑋𝑘
k-input problem:
i. i. d.
20
Outline
• Upper bounds
• Lower bounds
• Applications
21
App 1: Tracking (cont’d)
• Input: random permutation of • ,
• Problem: track within a prescribed relative tolerance with high probability
22
𝑆𝑡𝑖=1𝑠1∑𝑗𝑆𝑡
𝑖 , 𝑗
AMS Sketch
⋯
⋯
⋯ ⋯
𝑖
11
𝑠2
𝑠1𝑗
𝑆𝑡𝑖 , 𝑗
𝑆𝑡1
𝑆𝑡𝑠2
⋯⋯
• , 4-wise independent hash function𝑆𝑡=median(𝑆𝑡
𝑖 )
23
App 1: tracking (cont’d)• AMS: within w. p.
using and
• Sum tracking:
Expected total communication:
24
App 2: Bayesian Linear Regression• Feature vector , output
•
• Prior osterior • Sum tracking:
• Under random permutation input, the expected communication cost =
25
Summary
• We considered the sum tracking problem with non-monotonic distributed streams under random permutation, random i. i. d. and fractional Brownian motion
• Derived a practical algorithm that has order optimal communication complexity