imagine-p2p:a scalable p2p platform for the knowledge grid
DESCRIPTION
IMAGINE-P2P:A Scalable P2P Platform for the Knowledge Grid. Hai Zhuge, Xiaoping Sun et al. China Knowledge Grid Research Group Institute of Computing Technology Chinese Academy of Sciences. Main work. - PowerPoint PPT PresentationTRANSCRIPT
IMAGINE-P2P:A Scalable P2P Platform for the Knowledge Grid
Hai Zhuge, Xiaoping Sun et al.China Knowledge Grid Research Group
Institute of Computing TechnologyChinese Academy of Sciences
Main work IMAGINE-P2P : Integrated Multi-disciplinary Autonomous
Global Innovation Networking Environment on P2P network
A platform to efficiently support index-based path queries by incorporating a semantic overlay on a structured P2P network
The deployment of a scalable distributed trie index for broadcast queries on key strings
A decentralized load balancing method for improving the system utilization
A replication method is used to improve the availability of distributed index
Outline
Background Design Rationale Architecture of IMAGINE-P2P Deployment of Distributed Trie Index Performance Improvements Experiment Results Conclusion
Background
Motivation: Sharing: Expend services of resource
sharing and cooperation from local distributed systems to large-scale and geographically distributed systems.
Background
A Challenge: Scalability: “A SIMPLE GOAL ”(Jim
Gray, 2003) to scale up and scale out systems in large-scale and dynamic distributed environments.
P2P
whatever@home
Distributed DB
File sharing(Gnutella…)
Background
Function Level
Data
Computation
Information
Knowledge
Scalability Level
Local Distributed Global Distributed Global Decentralized
Computational Grid
Data Grid
Information Grid
Knowledge Grid
GRIDTo be
explored
Web
Current Situation:
Knowledge base
Database
Cluster
MIS
C/S systems
Background
Our Goal : To build a scalable P2P platform of the Knowledge Grid — IMAGINE-P2P
Provide architectural extensibility for different types of complex queries
Achieve scalable performance of queries
Improve the utilization and the availability
Design Rationale Make reasonable trade-offs to
achieve an acceptable scalability of the whole system.
Topology — Complexity vs. Efficiency/Robustness
Query routing — Complexity vs. Store/Query Efficiency
Utilization — Load balancing vs. Query Efficiency
Availability — Fault-tolerance vs. Store/Query Efficiency
Distributed index — Topology dependent vs. Topology independent
Knowledge Grid ApplicationsKnowledge Grid Applications
Distributed Trie indexDistributed Trie index
Semantic OverlaySemantic Overlay
Architecture of IMAGINE-P2P
Layered Architecture
Object OverlayObject OverlayA P2P overlay network providing scalable management of resources
Distributed indexes supporting scalable semantics-rich path queries on objects
A distributed trie index supporting scalable wild-card and broadcasting queries on objects
Future Knowledge Grid applications built on various distributed indexes
Architecture of IMAGINE-P2P Object Overlay — Topology Consideration
Theorem 1: Comparison-based structured overlays have to build a linear-order relation on their ID spaces to allow a deterministic routing.
Theorem 2: Constructing a comparison-based structured overlay is the same as sorting IDs of nodes and objects by a linear-order relation, which features a lower bound of O(N log N) comparisons. N is the number of nodes.Decision: Ring topology is the most direct and simple way to build comparison-based structured overlay network. Chord is such a case.
Architecture of IMAGINE-P2P Object Overlay — Topology
Chord has O(log N) hops and proved correctness of stabilization in dynamic environments
Oi ≤ 2k and there is no node Nj,Nj < Ni and Nj>Oi.
Oi Object
Physical node
1
21
2K+1
n
Ni
N2
N1
2K
SO1(O1, R, O3) SO1
Architecture of IMAGINE-P2P
Semantic Overlay — Basic structure
N1 Physical node
1
21
2K+1
n
Ni
N2
2K
O1
Indexing Node Object
O2 O3
O4O5 O6
O7
Key
Distributed Indexing Structure Object Overlay
Semantic Object: SO = (a, R, b)
SO2
SO2(O3, R, O6)
SO3
SO3(O6, R, O7)
Semantic path: a sp(a1R1a2R2…an-1Rn-1an)
Semantic Overlay
Query for a sp(O1O2O6O7)
Architecture of IMAGINE-P2P Semantic Overlay — Querying
Semantic Object: SO = (a, R, b) , either a or b, or both can be used as the keys by the DHT function.
Semantic path: — a query q = a1R1a2R2…an-1Rn-1an is decomposed into n − 1 subqueries, q1 = a1 R1 a2, q2 = a1a2 R2 a3, …, and qn-1 = a1a2…an-1 Rn-1 an .
O (log N) for a semantic object.O (log N + L) for a semantic path of length L in the best cases.O (log N * L) for a semantic path of length L in the worst cases.
Architecture of IMAGINE-P2P Semantic Overlay — Basic query operations
Query q Published Keys
Match patterns Return results if find Search hops
a R b (a), (b), (a, b) By a SO(a,R,b) A SO(a,R,b) 1
a R* b (a), (a, b) By a sp(aRx1R…xnRb) sp(aRx1R…xnRb) n hops/TTL
a * b (a), (b), (a, b) By a SO(a,R,b) with any R All the existing SO(a,R,b)
1
aR* (a), (a, b) By any sp(aRx1…Rxn) Each sp(aRx1…Rxn) Depth/TTL
a1R1a2… Rn-1an (a), (b), (a, b) By sp(a1R1a2… Rn-1an) sp(a1R1a2… Rn-1an) n hops
a1R1a2… Rn-1anR* (a), (a, b) By any sp(a1R1a2… Rn-1an R
*) any sp(a1R1a2… Rn-1an R*) Depth/ TTL
Rb (b), (a, b) By a SO(x,R,b) any SO(x,R,b) 1
R*b (b), (a, b) By any sp(xR*b) Each sp(xR*b) Depth/TTL
SO2(da, S, r)
SO3(dar, S, k)
SO4(dark, S, e)
Deployment of Distributed Trie Index
Distributed Trie Index — Basic StructureA full trie index
//
cb
a
c
k
i
g
d
a
r
k
/
back
big
dark
o
m
r
e
a
t
create
/
/
/
i
n
g
computing
/
e
p
u
t
r
computer
/
computer
computing
SO4
Physical node
1
21
2K+1
n
Ni
N2
N1
2K
SO1(d, S, a)SO2
SO3
SO1
A trie path tp(dark)
Query = dark
L=O(logmN), m = the size of attribute set, N the key number
Deployment of Distributed Trie Index
Trie Index — Two basic typesA full trie index A pruned trie index
//
c
o
m
p
u
t
e
r
i
n
g
b
a
c
k
i
g
d
a
r
k
/
back
big
computer
computing
dark
//
c
o
m
p
u
t
b d
/ /
/ /
/
back
big
computer computing
dark
r
e
a
t
create
/
create
/
/
/
/
/
A newly added key string is published in one message with its new indexing node o.
Deployment of Distributed Trie Index
Trie Index — Compressed pruned trie indexTo avoid splitting and moving existing indexing nodes
cb d
/ /
back
big
computer
computing
darkcreate
/ / /
A key object is defined as KO (a1a2…aj, S, K), where key K = a1a2…aj…an and aj is the leaf trie node of the trie path of K
o
/
A pruned trie index
//
c
o
m
p
u
t
b d
/ /
/ /
/
back
big
computer computing
dark
/
create
A compressed pruned trie index
Deployment of Distributed Trie Index
If there is no SO(a1, S, e) or SO(a1, S, a2), SO(a1, S, e) is published and the key K is published by KO(a1, S, K).
If there is SO(a1, S, e) but no KO(a1, S, K1) where K1 = a1b2b3…bn (b2 ≠ a2), the key K is published by KO(a1, S, K).
If there are already SO(a1, S, e) and a KO(a1, S, K1) that shares some prefixes with K, where K1 = a1a2…ajbj+1…bm, j ≥ 2, and bj+1 ≠ aj+1, SO(a1, S, e) is changed to SO(a1, S, a2) and two objects are published. One is SO(a1a2, S, e), the other is KO(a1a2, S, K).
If there is already a SO(a1, S, a2), forward the key K along the trie path tp(a1a2…ame) until to SO(a1 a2…am, S, e) (m ≤ n). If there is no such a KO(a1a2a3
…am, S, K2) that K2 = a1a2…amam+1bm+2…bp, just publish a KO(a1a2a3…am, S, K). Else change SO(a1 a2…am, S, e) to SO(a1 a2…am, S, am+1) and publish objects SO(a1a2a3…amam+1, S, e) and KO(a1a2a3…amam+1, S, K).
Trie Index — Publish compressed pruned trie index
Same colored objects share the same prefix and thus can be published in one message.
Deployment of Distributed Trie Index
Trie Index — Multi-access on physical nodes
(a, b)
(abc, d)
(ab, c)(abcde, e)
(abcd, e)Node A
Node B
Node C
q1 = aSb
q2 = abSc
q3 = abcSd
q4 = abcdSe
q5 = abcdeSe
abcde
Query = abcde
On a full trie index and a pruned trie index
Deployment of Distributed Trie Index
Trie Index — Avoiding multi-access
(a, b)
(abc, d)
(ab, c)(abcde, e)
(abcd, e)Node A
Node B
Node C
q1 = aSbq2 = abcdSe
q3 = abcdeSe
abcde
Query = abcde
On a full trie index and a pruned trie index
Deployment of Distributed Trie Index
Trie Index —Multi-access
(a, b)
(abc, d)
(ab, c)(abcde, e)
(abcd, e)Node A
Node B
Node C
q1 = aSb
q2 = abSc
q3 = abcSd
q4 = abcdSe
abcdef
Query = abcdef
On a compressed pruned trie index
Deployment of Distributed Trie Index
Trie Index —Avoid multi-access
(a, b)
(abc, d)
(ab, c)(abcde, e)
(abcd, e)Node A
Node B
Node C
q1 = aSb
q2 = abS~cdefabcdef
Query = abcdef
On a compressed pruned trie index
q2’ = abcdSef
Performance Improvements Utilization Improvement — Decentralized load balancing
Which object should be moved: }|min{ tikks dwww
Target: for each node ni (i = 1, 2, , N), C
L
C
L
i
ti
When should the object be moved: j
tj
i
ti
C
L
C
L
j
stj
i
sti
C
wL
C
wL
and
Where should the object be moved: jn }|min{ ikk
tk
j
tj Sn
C
L
C
Lwith
Action: ni moves loads to neighbors nodes nj selected from its neighbor node set },,,{
222 10 riiii nnnS
according to:
Performance Improvements Availability Improvement — Using path key replication to improv
e availability of semantic paths and distributed trie paths.
Duplicate a semantic object SO (a, R, b) by using key a and key b to publish it.
A path key of a semantic object contains the path information of the objects published before it on the same path. And A semantic object can be recovered from any latterly published semantic object on the same semantic path.
Experiment Results
An event-driven simulation environment
Simulation on a ring network with 200 and 2000 nodes.
Different distributions of object loads and node capacities are tested.
Experiment Results
Index Key Number
Key Type Average Key String Length
Internal Nodes Average Hops Moved Keys Splits
F Trie 13931 *.* 22.84 155158 22.84 0 0
P Trie 13913 *.* 22.84 43686 15.14 5980 43608
CP Trie 13913 *.* 22.84 7513 6.04 0 0
B-tree 13913 *.* 22.84 596 4 7696 592
B+-tree 13913 *.* 22.84 594 4 8850 590
F Trie 2349 *.pdf 55.06 100865 55.06 0 0
P Trie 2349 *.pdf 55.06 12338 17.41 900 12278
CP Trie 2349 *.pdf 55.06 1203 4.85 0 0
B-tree (26) 2349 *.pdf 55.06 137 3 1742 134
B-tree (3) 2349 *.pdf 55.06 1813 9 3608 1804
B+-tree (26) 2349 *.pdf 55.06 142 3 2085 139
B+-tree (3) 2349 *.pdf 55.06 1838 9 5487 1829
Trie index properties compared with B-tree and B+-tree. Compressed trie index has very short average depth.
Experiment ResultsThe size of a trie index is sensitive to only key string distribution. The independence to the network size and the number of keys make it scalable in large-scale and dynamic environment.
Experiment ResultsAverage search hops of a broadcast query for all the keys on the network using distributed trie indexes in network with different size and key number.
Experiment Results
An optimized search on trie indexes with 2349 PDF file names as keys
Index Physical nodes Messages without optimization
Messages in optimized search
Optimized avg hops
Average hops
F Trie 200 103076 23211 12.67 55.23
F Trie 2000 103076 98179 32.17 55.23
P Trie 200 12482 7986 8.09 17.36
P Trie 2000 12482 12340 13.047 17.36
CP Trie 200 1237 1092 4.89 4.89
CP Trie 2000 1237 1176 4.89 4.89
Experiment Results
Load balancing process show the variance of the system load decreasing with the load balancing iterations in different load distributions.
Experiment Results
Chord uses virtual servers to improve the load balance, where each physical node holds more than one virtual server and data objects are mapped by DHT function to virtual servers instead of physical nodes. They proposed that log N virtual servers per physical node can be optimal with high probability when considering only the number of keys.
Experiment ResultsLoad balancing process works effectively for distributed trie indexes that cause heavily imbalanced load distributions
Experiment Results
Experiment Results
If each extra hop incurred by the load balancing does not significantly delay a query, the average query latency under load balancing can be reduced when only considering storage consumption of objects.
Experiment Results
The availability of the full trie with the replication is better than that of the pruned trie because the pruned trie has much shorter path length and there are fewer copies in path key replication. The pruned trie however has better availability without replication, because it has much shorter search paths, i.e., it is less probably broken under the same failure distribution.
Conclusion
Publishing distributed indexes using semantic overlay methods can be a solution to support complex queries with high level semantics.
There are many conflicting factors that should be compromised when designing P2P system to achieve a scalable solution.
The distributed trie index can be scalable in large-scale and dynamic environments where keys string distribution is relatively stable.
Decentralized load balancing in large-scale and dynamic distributed systems can work effectively.
Future work still faces challenging in building more efficient distributed indexes, relieving hot spots on distributed indexes, improving availability while keeping system decentralized and scalable.
Future theoretic work should show that to what scale the trade-off can be made to achieve an acceptable scalability.
This work has been published in IEEE Transaction on Knowledge and Data Engineering
Questions and Comments
Thanks!
Full paper is available at IEEE Transactions on Knowledge and Data Engineering