randomized algorithms for tracking distributed count, frequencies and ranks

1
Model Distributed Streaming Model There are sites, each receive a stream of items The coordinator tries to track a function of the items arrived so far The goal is to minimize total communication and space Count Tracking Track Frequencies Track the total number of items arrived so far within error Trivial in both streaming and (one-shot) communication model Constant space per site Communication? can be achieved [2] The frequencies tracking problem is a generalization of count tracking We achieve same communication bound as the count tracking problem, which is a improvement We are also reduce the total space by a factor of We prove matching randomized lower bounds for both total communication and total space We also get near optimal bounds for the rank tracking problem in this model Analysis Estimate each local count separately Total count is To get an accurate estimator we set is an unbiased estimator of The variance of is But unkown We need to first track a constant approximation of , and update the probability whenever doubles The communication cost is References [1] Zengfeng Huang, Ke Yi and Qin Zhang. Randomized Algorithms for Tracking Distributed Count, Frequencies, and Ranks , PODS 2011. [2] Ke Yi and Qin Zhang. Optimal Tracking of Distributed Heavy Hitters and Quantiles, PODS 2009. Randomized Algorithms for Tracking Distributed Count, Frequencies and Ranks MADALGO – Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation Zengfeng Huang Aarhus University

Upload: orlando-navaez

Post on 30-Dec-2015

20 views

Category:

Documents


0 download

DESCRIPTION

Zengfeng Huang Aarhus University. Randomized Algorithms for Tracking Distributed Count, Frequencies and Ranks. MADALGO – Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Randomized Algorithms  for Tracking Distributed Count,  Frequencies and  Ranks

Model

Distributed Streaming Model There are sites, each receive a stream of items

The coordinator tries to track a function of the items arrived so far

The goal is to minimize total communication and space

Count Tracking Track Frequencies

Track the total number of items arrived so far within error

Trivial in both streaming and (one-shot) communication model Constant space per site Communication? can be achieved [2]

The frequencies tracking problem is a generalization of count tracking

We achieve same communication bound as the count tracking problem, which is a improvement

We are also reduce the total space by a factor of

We prove matching randomized lower bounds for both total communication and total space

We also get near optimal bounds for the rank tracking problem in this model

Analysis

Estimate each local count separately Total count is To get an accurate estimator we set

is an unbiased estimator of

The variance of is

But unkown

We need to first track a constant approximation of , and update the probability whenever doubles

The communication cost is

References

[1] Zengfeng Huang, Ke Yi and Qin Zhang. Randomized Algorithms for Tracking Distributed Count, Frequencies, and Ranks, PODS 2011.

[2] Ke Yi and Qin Zhang. Optimal Tracking of Distributed Heavy Hitters and Quantiles, PODS 2009.

Randomized Algorithms for Tracking Distributed Count, Frequencies and Ranks

MADALGO – Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation

Zengfeng HuangAarhus University