subscription subsumption evaluation for content-based publish/subscribe systems hojjat jafarpour,...
Post on 15-Jan-2016
223 views
TRANSCRIPT
Subscription Subsumption Evaluation forContent-Based Publish/Subscribe Systems
Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra,and Nalini Venkatasubramanian
2
Outline
Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption
checking Experimental evaluation Conclusions
3
Event-based pub/sub systems
Publish subscribe systems
Publish/ Subscribe Service
Event
4
Types of pub/sub systems
Topic-based vs. Content-based Centralized vs. Distributed
5
Information dissemination in pub/sub systems Publication/Subscription routing in
distributed pub/subSubscriber 1
Subscriber 2
Publisher
6
Reducing dissemination traffic
Goal: Preventing dissemination of redundant subscriptions
Subscriber 1
Subscriber 2
Publisher
Subscriber 3
Preventing redundant subscription dissemination• Reduces subscription forwarding traffic• Reduces subscription table size in broker• Speeds up publication matching
7
Detection of redundant subscriptions: Covering and Subsumption Subscription covering is a pair-wise relationship between
subscriptions Subscription s2 covers subscription s1 iff all publications
matching s1 also match s2
Subscription subsumption is a generalization of covering Subscription s is subsumed by subscription set T =
{s1, s2, .., sn} iff all publications matching s also match at least one of subscriptions in T
s1
s2
s1
s2
s3s3 is subsumed by s1υ s2
but not covered by either of them
8
Problem formulation
Content space: d-dimensional space where each dimension represents a numeric attribute
Subscriptions are d-dimensional rectangles Publications are d-dimensional points
Given a set of d-dimensional rectangles T = {s1, s2, .., sn}, is a new rectangle s contained in the disjunction (union) of rectangles in T ?
9
Outline
Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption
checking Experimental evaluation Conclusions
10
Related work Pair-wise covering
For a new subscription s, check if any previous subscription covers it
If not, then forward this query to all other brokers in the network
Probabilistic subsumption checking For a new subscription s, randomly
select d points in s If all of these points were covered
by previous subscriptions, assume s is subsumed
Complexity O(k.m.d), k = # of subscriptions, m = # dimensions & d = # of test points
False negatives may be generated, i.e., subscriptions that are not subsumed may be falsely assumed as subsumed May result in incorrect content routing
s1
s2
s3
s1
s2
11
Outline
Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption
checking Experimental evaluation Conclusions
12
Exact Subscription Subsumption Checking – Key Observation
Checking if a new subscription is covered by the union of previous subscriptions ≡ checking if new subscription intersects with the uncovered region.
We partition the content space into positive and negative spaces Positive space, , is parts of the space that are covered
by at least one existing subscription Negative space, , is parts of the space that are not
covered by any of the existing subscriptions Both can be represented by a set of non-overlapping
rectangles
Subscription s is subsumed iff
13
Representation of Negative Space & Subsumption Evaluation
We represent the negative space as a set of non-overlapping d-dimensional rectangles
If a new subscription intersects with any of these rectangles, it is not subsumed
r1
r3
r2 r4 r5
r6
r7
r8
14
Data structures & Complexity
The algorithm always detects whether a new subscription is subsumed or not
For efficient subsumption checking, the set of negative rectangles are indexed using R-Tree or KD-Tree for fast retrieval
For n subscriptions in d-dimensional space, the algorithm generates O(nd) negative rectangles
For high dimensional content space the number of negative rectangles can grow fast To control the growth of the number of negative
rectangles we propose an approximate subsumption checking algorithm
15
Outline
Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption
checking Experimental evaluation Conclusions
16
Approximation algorithm
r1
r3
r2 r4 r5
r6
r7 r8
r9
r6r6
In the example we have k=3
On adding a new subscription, restrict the number of new negative rectangles added ≤ k
At most O(k.n) negative rectangles after n active subscriptions
Leads to no false negatives, may generate some false positives (correctness is not compromised)
17
Top-k rectangle selection criteria
Top-k selection We propose a model based on benefit/cost for
selecting these rectangles.
benefit of partitioning a negative rectangle with respect to a subscription is the volume of the intersecting region.
cost is the number of new negative rectangles created
We choose the top-k negative rectangles with highest benefit to cost ratio for splitting and add them to the representation of negative space.
18
Subscription Forwarding in Approximate Algorithm If new subscription does not intersect
with any negative rectangle it is covered Otherwise
Find all intersecting negative rectangles with the subscription and sort them based on benefit/cost
Select first k negative rectangles and subtract the subscribed region from these
Update the representation of the negative space by replacing the k original rectangles by the new ones
(Algorithms for unsubscribing can be found in the paper)
19
Outline
Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption
checking Experimental evaluation Conclusions
20
Experimental evaluation
Simulation setup 10K subscriptions 2, 3, 4 and 5 dimensional space Each dimension in range [0, 1000] Zipfian distribution
21
Experimental evaluation
Measuring advantage of subsumption checking Subscription Subsumption vs. Covering
More than 50% improvement in redundant subscription detection
Exact algorithm Approximate algorithm(k = 50)
22
Storage overhead comparison (Exact vs Approximate)
Negative rectangle creation rate
23
Experimental evaluation
Effect of k in the approximate algorithm
Larger k value results in more reduction in redundant subscriptions
24
Experimental evaluation
Other Selection Metric Value Function
Considering both Benefit and Cost results in better subsumption checking
25
Conclusions
Efficient query subsumption checking can greatly improve the performance of pub/sub systems by reducing subscription routing traffic between brokers.
Negative space maintenance as a set of disjoint rectangles leads to efficient subsumption checking by converting it to a intersection detection problem
We proposed exact and approximate subsumption checking algorithms & compare their relative performances.
26
Thank You!
Questions?
27
Related work Ouksel et al. present a Monte Carlo type probabilistic algorithm
for the subsumption checking For a new subscription s, randomly select d points in s If all of these points were covered by previous subscriptions, assume
s is subsumed
Has the complexity of O(k.m.d) where k is number of subscriptions, m is number of dimensions and d is the number of tests
False negative, subscriptions that are not subsumed may be assumed as subsumed May result in incorrect content routing
May mistakenly detect that s3 is subsumed
Our proposed approach prevents false negativess1
s2
s3
28
Exact Subscription Subsumption Checking
Subsumption checking algorithm Input:
Set of negative rectangles: R={r1,r2,…,rm} Subscription s
Find Rintersect: The set of intersecting negative rectangles with s
If Rintersect = ∅ , s is subsumed Otherwise,
For every ri є Rintersect
R=R-{ri}
Ri = ri-s, represent Ri as a set of non-overlapping rectangles
R= R U Ri
29
Approximate Subscription Subsumption Checking
On adding a new subscription, the number of new negative rectangles added ≤ k
At most O(k.n) negative rectangles after n active subscriptions
In the following example we have k=3
r1
r3
r2 r4 r5
r6
r7
r8 r9
r10 r11
r12
r9
30
Experimental evaluation
Simulation setup 10K subscriptions 2, 3, 4 and 5 dimensional space Each dimension in range [0, 1000] Zipfian distribution For approximate algorithm, default value
for k is 50
31
Problem definition and formulation Content space: d-dimensional space where each
dimension representing a numeric attribute Subscriptions are d-dimensional rectangles Publications are d-dimensional pointsExample: Covering & Subsumption in 2-dimensional space
s1
s2
s1
s2
s3
s2 is covered by s1
s3 is subsumed by s1υ s2
but not covered by either of them