the continuous distributed monitoring model
TRANSCRIPT
![Page 1: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/1.jpg)
The Continuous Distributed
Monitoring ModelFarzad Nozarian
Chalmers University of Technology
18/04/2016
![Page 2: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/2.jpg)
218/04/2016
Outline
Chalmers University of Technology
Countdown Problem
Monitoring Entropy
Geometric Approach
Sampling
Introduction
![Page 3: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/3.jpg)
318/04/2016
What Is the Problem?
Chalmers University of Technology
Simple countdown!Tracking the entropy Distinct elementsSamplingTop-k items
Several processing nodes receive streams of data items
The goal is how to monitor a function over the union of items
Examples of monitoring functions:
with minimum communication cost
![Page 4: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/4.jpg)
418/04/2016
Motivation and Applications
Chalmers University of Technology
Monitoring the global health of the network in a large ISP
Tracking the usage of resources in distributed data centers by social
networks
Tracking global changes by collecting information from sensors
![Page 5: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/5.jpg)
518/04/2016
What Are the Challenges?
Chalmers University of Technology
Continuous MonitoringReal-time tracking, rather than one-shot query
StreamingData is received at a very high speed
Distributed Processing
Each node only sees part of the global streamCommunication cost is important
![Page 6: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/6.jpg)
618/04/2016
Trivial Solutions
Chalmers University of Technology
High communication cost!
Summarizing information in complex functionsParameter tuning for frequency of the polling
Infrequent polling
Delay in identifying events
Frequent polling
High communication
Centralizing all the items
Periodic polling
![Page 7: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/7.jpg)
The Countdown Problem
![Page 8: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/8.jpg)
818/04/2016
The Countdown Problem
Chalmers University of Technology
A threshold monitoring problem with many applications
Identifying when the total number of observations reaches
Trivial solution: Observers notify the coordinator by sending a bit when an event is observed
But we can improve it!
communication
![Page 9: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/9.jpg)
918/04/2016
A First Approach
Chalmers University of Technology
The total communication is
Idea: there are many events at each site before reaching the threshold
At least one site should see items before thresholdEvery site waits to see at least items before reporting to the coordinator
After receiving a report from observer the coordinator updates and informs all nodes
![Page 10: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/10.jpg)
1018/04/2016
A Quadratic Improvement
Chalmers University of Technology
Waiting for more updates before reporting to coordinatorProtocol runs over rounds
The total communication is
In round , all nodes wait to receive items before reporting to the coordinator
Coordinator starts the th round after receiving messages
![Page 11: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/11.jpg)
Monitoring Entropy
![Page 12: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/12.jpg)
1218/04/2016
Monitoring Entropy
Chalmers University of Technology
Monitoring non-monotone functions
Let denote the number of occurrences of item
Let denote the total number of items
Union of input streams implicitly define a probability distribution given by ,
The goal is monitoring the entropy of this distribution
![Page 13: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/13.jpg)
1318/04/2016
Entropy Protocol
Chalmers University of Technology
The protocol proceeds in multiple rounds
In the first round, coordinator collects a constant number of items from sites
In each subsequent round coordinator does the following:
Computes the parameter
Runs the approximate countdown protocol with Collects frequency distribution from all sites and computes current entropy
![Page 14: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/14.jpg)
The Geometric Approach
![Page 15: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/15.jpg)
1518/04/2016
The Geometric Approach (1/2)
Chalmers University of Technology
Goal: monitoring of arbitrary threshold non-linear functions
A geometric fact:
Idea: break down the testing of or into local conditions
![Page 16: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/16.jpg)
1618/04/2016
The Geometric Approach (2/2)
Chalmers University of Technology
Each site checks whether its sphere is monochromaticWhen all the constraints are upheld:
Query result remains unchangedNo communication is required
When a constraint is violated:New data is gathered from the streamsNew constraints are set on the streams
![Page 17: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/17.jpg)
Sampling
![Page 18: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/18.jpg)
1818/04/2016
Sampling
Chalmers University of Technology
Given inputs of total size , draw a sample of size Uniform over all subsets of size
Sampling cases
Sampling applications
Approximate query answeringQuery planningNumber of distinct elementsHeavy hitters
Infinite windowsSliding windows
![Page 19: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/19.jpg)
1918/04/2016
Infinite Windows (1/2)
Chalmers University of Technology
Each site associates a random weight with each observation
Coordinator maintains the following variables:
Set of random sample with weight no more than
Weight : the -th smallest weight so far in the system
Each site only maintains its local -th smallest weight
![Page 20: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/20.jpg)
2018/04/2016
Infinite Windows (2/2)
Chalmers University of Technology
Protocol outline:
Each site sends an element with weight smaller than to the coordinator
Coordinator updates and , if weight of received item is smaller than
Coordinator replies back to site with the current value of
![Page 21: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/21.jpg)
Thank You :)
![Page 22: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/22.jpg)
Support Slides
![Page 23: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/23.jpg)
2318/04/2016
A First Approach (long Ver.)
Chalmers University of Technology
Algorithm steps:Initially, each site report the coordinator whenever its num. of observed items exceeds Coordinator compute current slack based on the sum of all local count: ( is current count)Each site set upper bound on its local count
The total communication is
Idea: there are many events at each site before reaching the threshold At least one site should see items before
threshold
![Page 24: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/24.jpg)
2418/04/2016
Approximate Countdown
Chalmers University of Technology
Improve the cost by approximating the answer
Similar to previous approach but now terminate when the bound of unreported count reaches The number of rounds is reduced to
The total communication is
Let be the approx. parameter
Report 0 if count Report 1 if count
![Page 25: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/25.jpg)
2518/04/2016
Randomized Countdown Protocol (1/2)
Chalmers University of Technology
If grows very large the cost will be high
Allow algorithm to give an wrong answer with small probability
Randomization reduces the dependency to by parameter
![Page 26: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/26.jpg)
2618/04/2016
Randomized Countdown Protocol (2/2)
Chalmers University of Technology
With randomization parameter determined by analysis:
Each site collect of observations
With probability it sends a message otherwise remains silent
The coordinator wait until receive messages, then terminates
The total communication cost is
![Page 27: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/27.jpg)
2718/04/2016
Geometric Computational Model (1/2)
Chalmers University of Technology
Each site has a -dimensional vector called local statistics vector
Let be weights assigned to the streams
Define the global statistics vector as the weighted average of the s
Let be an arbitrary monitoring function
Goal: determining at any given time and threshold
![Page 28: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/28.jpg)
2818/04/2016
Geometric Computational Model (2/2)
Chalmers University of Technology
is the last statistics vector collected from the node Coordinator constructs estimate vector is the weighted average of the
Each node also maintains following parameters:
Decomposing relies on the following fact:
Delta vector:
Drift vector:
![Page 29: The Continuous Distributed Monitoring Model](https://reader035.vdocuments.us/reader035/viewer/2022062316/58ed04b01a28ab0a708b45eb/html5/thumbnails/29.jpg)
2918/04/2016
Geometric Interpretation
Chalmers University of Technology
Geometric interpretation:
Convex hull can be fully covered by spheres with radius centered at
�⃗�
𝑢1𝑢2
𝑢3
𝑢4𝑢5