randomized algorithms for tracking distributed count, frequencies and ranks
DESCRIPTION
Zengfeng Huang Aarhus University. Randomized Algorithms for Tracking Distributed Count, Frequencies and Ranks. MADALGO – Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation. - PowerPoint PPT PresentationTRANSCRIPT
Model
Distributed Streaming Model There are sites, each receive a stream of items
The coordinator tries to track a function of the items arrived so far
The goal is to minimize total communication and space
Count Tracking Track Frequencies
Track the total number of items arrived so far within error
Trivial in both streaming and (one-shot) communication model Constant space per site Communication? can be achieved [2]
The frequencies tracking problem is a generalization of count tracking
We achieve same communication bound as the count tracking problem, which is a improvement
We are also reduce the total space by a factor of
We prove matching randomized lower bounds for both total communication and total space
We also get near optimal bounds for the rank tracking problem in this model
Analysis
Estimate each local count separately Total count is To get an accurate estimator we set
is an unbiased estimator of
The variance of is
But unkown
We need to first track a constant approximation of , and update the probability whenever doubles
The communication cost is
References
[1] Zengfeng Huang, Ke Yi and Qin Zhang. Randomized Algorithms for Tracking Distributed Count, Frequencies, and Ranks, PODS 2011.
[2] Ke Yi and Qin Zhang. Optimal Tracking of Distributed Heavy Hitters and Quantiles, PODS 2009.
Randomized Algorithms for Tracking Distributed Count, Frequencies and Ranks
MADALGO – Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation
Zengfeng HuangAarhus University