icps 2006, lyon, france, june 26, 2006 a decentralized content-based aggregation service for...
DESCRIPTION
ICPS 2006, Lyon, France, June 26, Motivation Ubiquity of devices with embedded computing and communications capabilities Emergence of pervasive Grid environments and applications Realize information-driven, context-aware, computationally intensive end-to-end distributed applications System, application, information uncertainty! Programming abstractions and middleware services Scalable, self-managing, adaptive Based on content Asynchronous and decoupled (opportunistic) interactions Efficient and flexible information access, aggregation, assimilationTRANSCRIPT
ICPS 2006, Lyon, France, June 26,
2006
A Decentralized Content-based Aggregation Service for Pervasive Environments
Nanyan Jiang, Cristina Schmidt, Manish ParasharThe Applied Software Systems Laboratory
Rutgers, The State University of New JerseyPiscataway, NJ USA
ICPS 2006, Lyon, France
ICPS 2006, Lyon, France, June 26, 2006
2
Outline Motivation, requirements, challenges Meteor: Content-based
interaction/messaging middleware Content-based decentralized aggregation
service Deployment and evaluation Related work Conclusion
ICPS 2006, Lyon, France, June 26, 2006
3
Motivation Ubiquity of devices with embedded computing and
communications capabilities Emergence of pervasive Grid environments and
applications Realize information-driven, context-aware, computationally
intensive end-to-end distributed applications System, application, information uncertainty! Programming abstractions and middleware services
Scalable, self-managing, adaptive Based on content Asynchronous and decoupled (opportunistic) interactions Efficient and flexible information access, aggregation,
assimilation
ICPS 2006, Lyon, France, June 26, 2006
5
Data Aggregation and Assimilation in Pervasive Grid Environments Large volumes of heterogeneous and
continuous data streams Centralized, batch-processing solutions not
feasible Dynamic system behavior
Availability, quality of data Dynamic application requirements
ICPS 2006, Lyon, France, June 26, 2006
6
Project Meteor: Middleware Stack A self-organizing tiered overlay
network Bottom tier is a structured overlay, e.g.,
Chord, CAN, etc. Content-based routing engine (Squid)
Flexible content-based routing and querying with guarantees and bounded costs
Decentralized information discovery Associative Rendezvous Messaging
Symmetric post programming primitive Content-based decoupled interaction
with programmable reactive behavior Decentralized content-based in-
network aggregation service
AR messaging
Content-based Routing
Self-organizing Overlay
Pervasive Grid Environment
Pervasive Application
Agg
rega
tion
ICPS 2006, Lyon, France, June 26, 2006
7
Associative Rendezvous (AR)
Content-based decoupled interactions: All interactions are based on content, rather than names or
addresses The participants (e.g. senders and receivers) communicate
through an intermediary, the rendezvous point The communication is asynchronous. The participants can be
decoupled both in space and time. Programmable reactive behaviors:
The reactive behaviors at the rendezvous points are encapsulated within messages => flexibility, expressivity, and multiple interaction semantics
ICPS 2006, Lyon, France, June 26, 2006
8
The Semantics of Associative Rendezvous Interactions
Messages: (header, action, data) Symmetric post primitive: does not
differentiating between interest/data Associative selection
match between interest and data profiles
Reactive behavior Execute action field upon matching
profile credentials
message context
TTL (time to live)
Action
[Data]
Header
• store• retrieve• notify• delete
Profile = list of (attribute, value) pairs:Example: <(sensor_type, temperature), (latitude, 10), (longitude, 20)>
C1
C2
post (<p1, p2>, store, data)
notify_data(C2)
notify(C2)
(1)
(3)
(2) post(<p1, *>, notify_data(C2) )
<p1, p2> <p1, *>
match
ICPS 2006, Lyon, France, June 26, 2006
11
Content-based Aggregation Services Extends the AR programming abstraction to enable content-
based aggregation services AR post(<L, speed, 3600>, retrieve(AVERAGE, 300))
Region L: (35-60, 40-80) find the average speed in the stretch of road specified by region
L every 5 minutes for the next 1 hour
Compute the traffic bottleneck along Route 95 from East Brunswick to NYC in the next one hour
Estimate the time needed to enter NYC
1. Anyone exceed speed limit2. Where is the maximum speed on road3. Minimum speed on road and where
ICPS 2006, Lyon, France, June 26, 2006
12
Self-organizing Overlay (Chord) A self-organizing P2P ring overlay Nodes and data have unique identifiers (keys), from a circular key
space (0 to 2m) Each node maintains a routing table, called “finger table” A key is stored at the first node whose identifier is greater or equal
with the key The request is routed to the neighbor node closer to the destination Routes in O(log n) hops
5
1
2
3
4
6
7
0
Finger = the successor of (this node id + 2i-1) mod 2m, 0 <= i <= m
3 + 1 53 + 2 5 3 + 4 0
1 + 1 31 + 2 31 + 4 5
0 + 1 10 + 2 3 0 + 4 5
Routing from node 1 to node 6
ICPS 2006, Lyon, France, June 26, 2006
13
SquidTON: Reliability and Fault Tolerance Pervasive Grid systems are dynamic, with nodes joining, leaving
and failing relatively often => data loss and temporarily inconsistent overlay structure => the system cannot offer guarantees
Build redundancy into the overlay network Replicate the data
SquidTON = Squid Two-tier Overlay Network Consecutive nodes form unstructured groups, and at the same time are
connected by a global structured overlay (e.g. Chord) Data is replicated in the group
3325
22
82
86
9194
128132150
157
141
231
249240
185
192
1710
Groups of nodes Group identifier
250231
249 240
185192
128
132
150
157141
161
8286
9194
3325
22
1710
102
41
ICPS 2006, Lyon, France, June 26, 2006
14
Content-based Routing (Squid)
Route based on semantic headers Use a dimension-reducing mappings to map from
multidimensional information space => 1-dimensional index space can route based on keywords, partial keywords, wildcards and
ranges Builds on an underlying DHT lookup mechanism
Offers guarantees: all destinations matched by the content will be identified with bounded costs (messages, intermediate nodes, etc.)
Semantic Header(kw1, kw2, …, kwD)
SFCPoint in a D-dimensional space
Point in a 1-dimensional index space
Rendezvous Peers
(P1, P2, …Pk, …)
ICPS 2006, Lyon, France, June 26, 2006
15
Squid – Definitions Keyword tuple (used to specify the destination)
List of d keywords, wildcards and/or ranges Example:
(temperature, celsius), (temp*, *), (*, 10, 20-25)
Simple keyword tuple Contains only complete keywords
Complex keyword tuple Contains wildcards and/or ranges
Sensor type
Celsius
Simple keyword tuple
Temperature
Uni
ts
Latitude
Sens
or ty
pe
Longitude
2025
10
Complex keyword tuple
ICPS 2006, Lyon, France, June 26, 2006
16
Content Indexing: Hilbert SFC f: Nd N, recursive generation
Properties: Digital causality Locality preserving Clustering
1
0
10 00 01 10 11
11
10
01
0000 11
01 10
0000
0010
0001
0011
0100
01010110
011110001001
101010111100
11111110
1101
Cluster: group of cells connected by a segment of the curve
ICPS 2006, Lyon, France, June 26, 2006
17
Content Indexing and Routing
4
4 70
3
Generate the clusters associated with the content profile
altit
ude
longitude
0
13
33
47
51
40
(*, 4)Content profile 3:
Rout to the nodes that store the clusters
Content profile 2
Content profile 3
(4-7,0-3)Content profile 2:
Matching messages
Send the results to the requesting node
2
1 7
7
(2,1)Content profile 1:
Content profile1
ICPS 2006, Lyon, France, June 26, 2006
18
Squid: Routing Optimization
More than one cluster are typically stored at a node
Not all clusters that are generated for a query exist in the network => optimize!
SFC generation recursive => clusters generation is recursive => the process of cluster generation can be viewed as a prefix tree (trie)
Optimization: embed the tree into the overlay, and prune nodes during the construction phase
ICPS 2006, Lyon, France, June 26, 2006
19
Decentralized In-network Aggregation Post AR messages with aggregation behaviors in the system
Content indexing: from n-dimension to 1-dimension using Hilbert SFC Content-based routing: prefix trie construction Decentralized in-network aggregation
Use aggregation tries for back-propagating and aggregating matching data items
Post (<p1, p2, …, pN>, action (aggrBehavior))
Area in the N-dimensional semantic space
Segments in the 1-dimensional index space
Matched data itemsTrie-based In-network
AggregationTrie-based Content Routing
Indexing using SFC
Trie constructionBack-propagation of matched data
ICPS 2006, Lyon, France, June 26, 2006
20
Trie Construction111
110
101
100
011
010
001
000
speed
Long
itude
0
1
1100
01 10
00
01
01
10
11
0000 0001
0011
0100
0101 0110
0111 1000
1001 1010
1011
11001101
1110 1111
0010
post(<011, 010-110>, retrieve (AVG, 300))
000101
011011
011111001010
0010*
0*
011*
011100
0111*
011011
011100
011111
001011
001010
Recursive resolution of message profile to construct prefix trie
ICPS 2006, Lyon, France, June 26, 2006
21
Trie-based Content Routing
Embed the tree into overlay network Node 100001 is responsible for storing the subtree
routed at 011*
000000
000100
001001
001111
100001
1110000*
00
011*0010*
0110110111*
001010001001
ICPS 2006, Lyon, France, June 26, 2006
22
Trie-based In-Network Aggregation (I)
AR message was issued at node 111000 First recursion level results in clusters with prefix 0 At node 000000, further level recursion, the cluster 011 and 0010 are
identified and padded with 0, for example; Node receive the sub-queries generate next level recursion, …
1110000*
<profile_entry> <query>(011,010-110)</query> <TTL>20</TTL> <source>111000</source> <prefix_list> <prefix>0</prefix> <prefix>0010</prefix> <prefix_list><profile_entry>
000000
001001
0010*
100001
001111001010
011*
<profile_entry> ...... <prefix_list> <prefix>0</prefix> <prefix>0010</prefix> <prefix>001010</prefix> <prefix_list><profile_entry>
ICPS 2006, Lyon, France, June 26, 2006
23
Trie-based In-Network aggregation (II)
Caching to improve latencies Managing system dynamics
Joins, Leaves, Fails Load balancing Building geographical awareness into the overlay
1110000*
<message> <profile>(011,010-110)</profile> ...... <aggr_val>...<aggr_value> <source>111000</source> <prefix_list> <prefix>0</prefix> <prefix>0010</prefix> <prefix_list><message>
000000
001001
0010*
100001
001111001010
011*
<profile_entry> ...... <source>111000</source><profile_entry>
ICPS 2006, Lyon, France, June 26, 2006
24
Miscellaneous Issues and Optimizations Varied Link Latencies Caching to reduce routing latencies Load balacing Addressing churn
Join, leave Failures
Leaf peers, intermediate peers, source peer Geographical locality (to an extent)
ICPS 2006, Lyon, France, June 26, 2006
25
Meteor: Implementation Overview Current implementation builds on JXTA Chord, Squid and AR Messaging layer are
implemented as event-driven JXTA services Deployments
Local area network 64 1.6 GHz Pentium IV nodes with 100 Mbps Ethernet
interconnection network Orbit sensor network testbed
400 802.11 radio nodes with 802.11 wireless connections PlanetLab wide-area testbed
ICPS 2006, Lyon, France, June 26, 2006
26
Experimental Evaluation Aggregation services:
Complex queries issued from each of the three deployment environments
Post(34-444,temp*, retrieve(count)) Effectiveness of in-network aggregation
Aggregate range queries Number of messages processed in the network
Robustness in case of single peer failures
Note that the performance of Squid and AR have been separately evaluated
ICPS 2006, Lyon, France, June 26, 2006
27
Scalability of the Aggregation Services
Scalability of Meteor aggregation queries
0
500
1000
1500
2000
2500
3000
4 8 16 32 64
Number of peers
time
(ms) Wired LAN
Orbit Wireless Testbed
Wide-area PlanetLab
Aggregate query: post(<(100-300, 75-125), temperature>, retrieve(avg, 300))
ICPS 2006, Lyon, France, June 26, 2006
28
Effectiveness of In-Network Aggregation Simulation results
500 nodes AR aggregate message:
Post(<100-230, 35-120>, retrieve(avg))
Fig (a) number of messages with/without in-network aggregation
Fig (b) actual number of messages per peer
ICPS 2006, Lyon, France, June 26, 2006
29
Robustness of Aggregation Service
0
10
20
30
40
50
60
70
Time (milliseconds)
Num
ber
of in
clud
ed r
esul
ts
Time to aggregate all
Single node failure
Aggregate trie reconstruction
Reconstructed trie Assume single node failure An intermediate node fails
at about 205 time ticks On-going aggregation
operations are lost at this node
About 100 ms required to correct the aggregation trie in a LAN environment
ICPS 2006, Lyon, France, June 26, 2006
30
Related works Aggregation in P2P environments
Astrolabe, Cone, PHT Aggregation in sensor networks
Cougar, TAG Meteor
Content-based aggregation abstraction Asynchronous and decoupled
No persistent state maintained in the network Trie reused across aggregations Guarantees
ICPS 2006, Lyon, France, June 26, 2006
31
Summary Content-based, decoupled interactions and
aggregations are essential for pervasive Grid applications
Meteor provides unified programming abstractions for content-based decoupled interactions and decentralized in-network aggregation
Recurrent and spatially constrained queries Scalable and efficient trie-based design and implementation
Deployments on LAN, Orbit and PlanetLab Scalability, performance, robustness