content-based publish-subscribe over structured p2p networks
DESCRIPTION
Content-Based Publish-Subscribe Over Structured P2P Networks. Peter Triantafillou and Ioannis Aekaterinidis Research Academic Computer Technology Institute and Department of Computer Engineering and Informatics, University of Patras , Greece. Agenda. Introduction/Goal - PowerPoint PPT PresentationTRANSCRIPT
Content-Based Publish-Subscribe Over Structured P2P Networks
Peter Triantafillou and Ioannis AekaterinidisResearch Academic Computer Technology Institute and
Department of Computer Engineering and Informatics, University of Patras, Greece
Agenda
• Introduction/Goal• Publish-Subscribe Systems• Publish-Subscribe over Chord– Processing Subscriptions– Processing Events
• Improving Performance• Conclusion
Introduction• Publish Subscribe systems are becoming very popular for
building large scale distributed systems and applications– Anonymity between publisher and subscriber
• Centralized:– Adv: Global image of system making matching algorithm easy
to implement– Dis: Scalability
• Decentralized:– Adv: Scalability– Challenge: development of efficient distributed matching
algorithm
Goal
• Chose to use Chord because:– Simplicity– Popularity– Scalable– Self-Organizing– Well Performing
• Challenge: to develop a strategy for using DHTs to provide good support for range predicates– Which are popular when specifying subscription attributes
Agenda
• Introduction/Goal• Publish-Subscribe Systems• Publish-Subscribe over Chord– Processing Subscriptions– Processing Events
• Improving Performance• Conclusion
Publish-Subscribe Systems
• Asynchronous messaging paradigm• Senders (publishers) of messages are not programmed
to send their messages to specific receivers (subscribers)• Published messages are characterized into classes
(without knowledge of what subscribers there may be)• Subscribers express interest in one or more classes and
only receive messages that are of interest (without knowledge of what publishers there are)
• This Decoupling of publishers and subscribers allows for greater scalability and a more dynamic network topology
Pub-Sub Message Filtering
• Subscribers receive only a subset of the total messages published
• 2 Main Types– Topic Based– Content Based
• Hybrid– Coupling of topic and content based systems
Topic Based Pub/Sub Systems
• Much like newsgroups• Users join a group (topic)• All messages related to that topic are
broadcasted to all users participating in the specific group
Content Based Pub/Sub Systems
• Preferable• Give users the ability to express their interest
by specifying predicates over the values of a number of well defined attributes
• Matching of publications (events) to subscriptions (interest) is done based on the content (values of attributes)
Hybrid System
• Publishers post messages to a topic while subscribers register content-based subscriptions to one or more topics
• Publications and subscriptions are automatically classified in topics (using an application-specific schema)
• Drawbacks:– Design of the domain schema plays fundamental role in
the system’s performance– Likely many false positives may occur.
Agenda
• Introduction/Goal• Publish-Subscribe Systems• Publish-Subscribe over Chord– Processing Subscriptions– Processing Events
• Improving Performance• Conclusion
Event Schema• Set of typed attributes• Each attribute ai consists of:
– Type – belong to predefined set of primitive data types– Name - string– Value v(ai)
• Any range defined by the minimum and maximum values (vmin(ai), vmax(ai)) along with the attribute’s precision Vpr(ai)
Subscription Schema
• Contains all interesting subscription-attribute data types (integers, strings, etc.) and all common operators (=, ≠, <, >, etc.)
• Event matches subscription iff all the subscription’s attribute predicates/constraints are satisfied
• Can have two or more constraints for the same attribute
Subscription Identifier
• Concatenation of 3 parts:– C1: id of the node receiving the subscription• Size: m bits in a Chord ring with an m-bit address space
– C2: id of the subscription itself• Size: bits equal to the rounded-up base-2 logarithm of the
maximum number of outstanding subscriptions a node can have
– C3: number of attributes on which constraints are declared• Size: max value = total number of attributes supported by the
system
Subscription ID Example
• Assume Chord ring with a 3-bit identifier address space
• Each node can support 8 outstanding subscriptions with an attribute schema including 7 attributes
• Depicts subscription 3 (C2=3), belonging to node 4 (C1=4), comprised of constraints on 5 attributes (C3=5)
Storing Subscriptions
• Done using the hash function provided by Chord (SHA-1)
• Returns an identifier uniformly distributed in the address space
• k=h(v(ai))• Following the Chord API, the subID is placed at
node: successor(k)
Storing Subscriptions
• Procedure:
Storing Example
• Attributes are processed one at a time• Subscription ID is stored at:– Successor(h(“NYSE”)), in the list dedicated for attribute
Exchange– Successor(h(“OTE”)), in the list dedicated for attribute
Symbol• Since the Price attribute is over a range of
8.30<Price<8.70 with a precision of .01 the subscription ID is stored at:– Successor(h(Price)); for the values 8.31,8.32, …, 8.69.
Updating Subscriptions
• Updating attributes of a subscription with equality only 2 nodes are affected:– Delete the Subscription ID from:• nodeID = successor(h(vstale_value(ai)))
– Add the subscription ID to node: (appropriate list)• nodeID = successor(h(vupdated_value(ai)))
Updating Subscriptions
• The procedure for updating a range value depends on the new values of the range bounds (vlow_NEW(ai) and vhigh_NEW(ai)) compared to the old values
• If vlow_NEW(ai) < vlow(ai) store the subID to the nodes that cover [vlow_NEW(ai), vlow(ai)) range
• If vhigh_NEW(ai) > vhigh(ai) store the subID to the nodes that cover (vhigh(ai), vhigh_NEW(ai)] range
Updating Subscriptions
• If vlow_NEW(ai) > vlow(ai) delete the subID from the nodes that cover [vlow(ai), vlow_NEW(ai)) range.
• If vhigh_NEW(ai) < vhigh(ai) delete the subID from the nodes that cover (vhigh_NEW(ai), vhigh(ai)] range
Processing Events: Matching Algorithm
Matching Events with Subscriptions Example
• Suppose we have Subscriptions 1 and 2 generated by two clients connected to a Chord node and Event 1
• First, the algorithm will collect all the subIDs lists in which the values of the event attributes satisfy the corresponding constrains of the subscriptions
Matching Events with Subscriptions Example (continued)
• The algorithm starts with attribute Exchange = “NYSE” and retrieves the subID list (LExchange) from node successor(h(“NYSE”))
• This list contains only the subID1
– LExchange -> subID1
Matching Events with Subscriptions Example (continued)
• Next attribute Symbol = “OTE”; subID list (LSymbol) from node successor(h(“OTE”)) is retrieved– LSymbol -> subID1, subID2
• Since both subscriptions are satisfied for the event
Matching Events with Subscriptions Example (continued)
• Next attribute Price = 8.40; subID list (LPrice) from node successor(h(8.40)) is retrieved– LPrice -> subID1
• Since only subscription 1 has a price that falls within this range.
Matching Events with Subscriptions Example (continued)
• Lastly attribute Low = 8.22; subID list (LLow) from node successor(h(8.22)) is retrieved– LLow -> subID2
• Since only subscription 2 has an attribute Low
Matching Events with Subscriptions Example (continued)
• After this phase of the matching process the collected subscription ID lists are:• LExchange -> subID1
• LSymbol -> subID1, subID2
• LPrice -> subID1
• LLow -> subID2
• Subscription1 was found in 3 lists while subscription2 was found in 2
• By processing the subIDs of the subscriptions (c3 part) we can find out that both subscriptions have constraints over 3 attributes.
Matching Events with Subscriptions Example (continued)
• Since subscription 1 was found in 3 lists, a match is implied and it’s subID is kept in order to inform the node which generated the subscription about the matched event.
• While holding metadata info for subID1 in order to locate the IP address of the client that generated the subscription
• The node storing the subscription is contacted (using nodeID equal to c1 field of the subID1) and the event is delivered to the interested client
Expected Performance
• Subscription Storage Procedure:– Average number of hops needed to store a subID
depends on the type of constraints over the attributes
– Equality: ½ log(N)• subID is stored in a single node
– Range Constraint:Nodes affected which leads to r*1/2 log(N) hops on
average to store the subID
)(av)(a v- )(av
ipr
ilowihighr
Expected Performance
• Update/Deletion of Subscription– Again, depends on the type of constraints over the
attributes• Equality: update performed by contacting Log(N) nodes• Ranges: number of nodes is k*log(N) on average
– K depends on whether the new range is smaller or wider than the old range
Expected Performance
• Event-Processing (matching)– Involves contacting nodes to
collect the subscription id lists
• Reminder: a Chord network with N nodes and a 2m-bit address space, the average number of nodes that must be contacted to find a successor is: – ½ log(N) hops
• By design, this proposal leads to fast and scalable event matching.
)log(21 NN eventa
Agenda
• Introduction/Goal• Publish-Subscribe Systems• Publish-Subscribe over Chord– Processing Subscriptions– Processing Events
• Improving Performance• Conclusion
Improving Performance
• Currently storing a subscription over the Chord ring takes r* ½ * log(N) hops on average for every attribute– r depends on precision, high, and low values
• Using an order preserving hash function we can optimize to r+ ½ *log(N) hops
Order Preserving Chord
• Using a 2m- order preserving hash function• Expected performance:– ½ log(N) hops to locate node storing minimum
value of the range (vlow(ai))– Then, perform r hops to store remaining values in
the range to lead to r+ ½ log(N) total hops
Order Preserving Hash Function
• Suppose every attribute is characterized by– vmin(ai): minimum value ai can take– vmax(ai): maximum value ai can take– vpr(ai): precision of ai
• vj(ai) is any value in [vlow(ai), vhigh(ai)]• OPHF is:
mm
ii
iijioij avav
avavasavh 2mod)2*
)()()()(
)(())((minmax
min
))(_()( iio anameattributehashas
Subscription and Event Processing with OPHF
• Example: Storing Subscription– Consider Chord ring with 3-bit ids and 8 nodes– Subscription of a single integer attribute a arriving
at node 3 with constraint 0<v(a)<4– Using Chord requires O(r*log(N)) hops to store the
subID at three nodes
Subscription and Event Processing with OPHF
• Using the OPHF with Chord:– Perform O(log(N)) hops only once to reach the
first node (node 6)– Storing the subID at nodes 7 and 0 requires 2
more hops
Agenda
• Introduction/Goal• Publish-Subscribe Systems• Publish-Subscribe over Chord– Processing Subscriptions– Processing Events
• Improving Performance• Conclusion
Conclusion
• Not Addressed– Load Balancing– Small Domain Problem
• Able to support equality and range attributes while leveraging Chord to build a scalable, self-organizing, well performing content based publish-subscribe system.
Sources Cited
• http://en.wikipedia.org/wiki/Publish/subscribe