imagine-p2p:a scalable p2p platform for the knowledge grid

IMAGINE-P2P:A Scalable P2P Platform for the Knowledge Grid

Hai Zhuge, Xiaoping Sun et al.China Knowledge Grid Research Group

Institute of Computing TechnologyChinese Academy of Sciences

Main work IMAGINE-P2P : Integrated Multi-disciplinary Autonomous

Global Innovation Networking Environment on P2P network

A platform to efficiently support index-based path queries by incorporating a semantic overlay on a structured P2P network

The deployment of a scalable distributed trie index for broadcast queries on key strings

A decentralized load balancing method for improving the system utilization

A replication method is used to improve the availability of distributed index

Outline

Background Design Rationale Architecture of IMAGINE-P2P Deployment of Distributed Trie Index Performance Improvements Experiment Results Conclusion

Background

Motivation: Sharing: Expend services of resource

sharing and cooperation from local distributed systems to large-scale and geographically distributed systems.

Background

A Challenge: Scalability: “A SIMPLE GOAL ”(Jim

Gray, 2003) to scale up and scale out systems in large-scale and dynamic distributed environments.

P2P

whatever@home

Distributed DB

File sharing(Gnutella…)

Background

Function Level

Data

Computation

Information

Knowledge

Scalability Level

Local Distributed Global Distributed Global Decentralized

Computational Grid

Data Grid

Information Grid

Knowledge Grid

GRIDTo be

explored

Web

Current Situation:

Knowledge base

Database

Cluster

MIS

C/S systems

Background

Our Goal : To build a scalable P2P platform of the Knowledge Grid — IMAGINE-P2P

Provide architectural extensibility for different types of complex queries

Achieve scalable performance of queries

Improve the utilization and the availability

Design Rationale Make reasonable trade-offs to

achieve an acceptable scalability of the whole system.

Topology — Complexity vs. Efficiency/Robustness

Query routing — Complexity vs. Store/Query Efficiency

Utilization — Load balancing vs. Query Efficiency

Availability — Fault-tolerance vs. Store/Query Efficiency

Distributed index — Topology dependent vs. Topology independent

Knowledge Grid ApplicationsKnowledge Grid Applications

Distributed Trie indexDistributed Trie index

Semantic OverlaySemantic Overlay

Architecture of IMAGINE-P2P

Layered Architecture

Object OverlayObject OverlayA P2P overlay network providing scalable management of resources

Distributed indexes supporting scalable semantics-rich path queries on objects

A distributed trie index supporting scalable wild-card and broadcasting queries on objects

Future Knowledge Grid applications built on various distributed indexes

Architecture of IMAGINE-P2P Object Overlay — Topology Consideration

Theorem 1: Comparison-based structured overlays have to build a linear-order relation on their ID spaces to allow a deterministic routing.

Theorem 2: Constructing a comparison-based structured overlay is the same as sorting IDs of nodes and objects by a linear-order relation, which features a lower bound of O(N log N) comparisons. N is the number of nodes.Decision: Ring topology is the most direct and simple way to build comparison-based structured overlay network. Chord is such a case.

Architecture of IMAGINE-P2P Object Overlay — Topology

Chord has O(log N) hops and proved correctness of stabilization in dynamic environments

Oi ≤ 2k and there is no node Nj,Nj < Ni and Nj>Oi.

Oi Object

Physical node

1

21

2K+1

n

Ni

N2

N1

2K

SO1(O1, R, O3) SO1

Architecture of IMAGINE-P2P

Semantic Overlay — Basic structure

N1 Physical node

1

21

2K+1

n

Ni

N2

2K

O1

Indexing Node Object

O2 O3

O4O5 O6

O7

Key

Distributed Indexing Structure Object Overlay

Semantic Object: SO = (a, R, b)

SO2

SO2(O3, R, O6)

SO3

SO3(O6, R, O7)

Semantic path: a sp(a1R1a2R2…an-1Rn-1an)

Semantic Overlay

Query for a sp(O1O2O6O7)

Architecture of IMAGINE-P2P Semantic Overlay — Querying

Semantic Object: SO = (a, R, b) , either a or b, or both can be used as the keys by the DHT function.

Semantic path: — a query q = a1R1a2R2…an-1Rn-1an is decomposed into n − 1 subqueries, q1 = a1 R1 a2, q2 = a1a2 R2 a3, …, and qn-1 = a1a2…an-1 Rn-1 an .

O (log N) for a semantic object.O (log N + L) for a semantic path of length L in the best cases.O (log N * L) for a semantic path of length L in the worst cases.

Architecture of IMAGINE-P2P Semantic Overlay — Basic query operations

Query q Published Keys

Match patterns Return results if find Search hops

a R b (a), (b), (a, b) By a SO(a,R,b) A SO(a,R,b) 1

a R* b (a), (a, b) By a sp(aRx1R…xnRb) sp(aRx1R…xnRb) n hops/TTL

a * b (a), (b), (a, b) By a SO(a,R,b) with any R All the existing SO(a,R,b)

1

aR* (a), (a, b) By any sp(aRx1…Rxn) Each sp(aRx1…Rxn) Depth/TTL

a1R1a2… Rn-1an (a), (b), (a, b) By sp(a1R1a2… Rn-1an) sp(a1R1a2… Rn-1an) n hops

a1R1a2… Rn-1anR* (a), (a, b) By any sp(a1R1a2… Rn-1an R

*) any sp(a1R1a2… Rn-1an R*) Depth/ TTL

Rb (b), (a, b) By a SO(x,R,b) any SO(x,R,b) 1

R*b (b), (a, b) By any sp(xR*b) Each sp(xR*b) Depth/TTL

SO2(da, S, r)

SO3(dar, S, k)

SO4(dark, S, e)

Deployment of Distributed Trie Index

Distributed Trie Index — Basic StructureA full trie index

//

cb

a

c

k

i

g

d

a

r

k

/

back

big

dark

o

m

r

e

a

t

create

/

/

/

i

n

g

computing

/

e

p

u

t

r

computer

/

computer

computing

SO4

Physical node

1

21

2K+1

n

Ni

N2

N1

2K

SO1(d, S, a)SO2

SO3

SO1

A trie path tp(dark)

Query = dark

L=O(logmN), m = the size of attribute set, N the key number


Trie Index — Two basic typesA full trie index A pruned trie index

//

c

o

m

p

u

t

e

r

i

n

g

b

a

c

k

i

g

d

a

r

k

/

back

big

computer

computing

dark

//

c

o

m

p

u

t

b d

/ /

/ /

/

back

big

computer computing

dark

r

e

a

t

create

/

create

/

/

/

/

/

A newly added key string is published in one message with its new indexing node o.


Trie Index — Compressed pruned trie indexTo avoid splitting and moving existing indexing nodes

cb d

/ /

back

big

computer

computing

darkcreate

/ / /

A key object is defined as KO (a1a2…aj, S, K), where key K = a1a2…aj…an and aj is the leaf trie node of the trie path of K

o

/

A pruned trie index

//

c

o

m

p

u

t

b d

/ /

/ /

/

back

big

computer computing

dark

/

create

A compressed pruned trie index


If there is no SO(a1, S, e) or SO(a1, S, a2), SO(a1, S, e) is published and the key K is published by KO(a1, S, K).

If there is SO(a1, S, e) but no KO(a1, S, K1) where K1 = a1b2b3…bn (b2 ≠ a2), the key K is published by KO(a1, S, K).

If there are already SO(a1, S, e) and a KO(a1, S, K1) that shares some prefixes with K, where K1 = a1a2…ajbj+1…bm, j ≥ 2, and bj+1 ≠ aj+1, SO(a1, S, e) is changed to SO(a1, S, a2) and two objects are published. One is SO(a1a2, S, e), the other is KO(a1a2, S, K).

If there is already a SO(a1, S, a2), forward the key K along the trie path tp(a1a2…ame) until to SO(a1 a2…am, S, e) (m ≤ n). If there is no such a KO(a1a2a3

…am, S, K2) that K2 = a1a2…amam+1bm+2…bp, just publish a KO(a1a2a3…am, S, K). Else change SO(a1 a2…am, S, e) to SO(a1 a2…am, S, am+1) and publish objects SO(a1a2a3…amam+1, S, e) and KO(a1a2a3…amam+1, S, K).

Trie Index — Publish compressed pruned trie index

Same colored objects share the same prefix and thus can be published in one message.


Trie Index — Multi-access on physical nodes

(a, b)

(abc, d)

(ab, c)(abcde, e)

(abcd, e)Node A

Node B

Node C

q1 = aSb

q2 = abSc

q3 = abcSd

q4 = abcdSe

q5 = abcdeSe

abcde

Query = abcde

On a full trie index and a pruned trie index


Trie Index — Avoiding multi-access

(a, b)

(abc, d)

(ab, c)(abcde, e)

(abcd, e)Node A

Node B

Node C

q1 = aSbq2 = abcdSe

q3 = abcdeSe

abcde

Query = abcde

On a full trie index and a pruned trie index


Trie Index —Multi-access

(a, b)

(abc, d)

(ab, c)(abcde, e)

(abcd, e)Node A

Node B

Node C

q1 = aSb

q2 = abSc

q3 = abcSd

q4 = abcdSe

abcdef

Query = abcdef

On a compressed pruned trie index


Trie Index —Avoid multi-access

(a, b)

(abc, d)

(ab, c)(abcde, e)

(abcd, e)Node A

Node B

Node C

q1 = aSb

q2 = abS~cdefabcdef

Query = abcdef

On a compressed pruned trie index

q2’ = abcdSef

Performance Improvements Utilization Improvement — Decentralized load balancing

Which object should be moved: }|min{ tikks dwww

Target: for each node ni (i = 1, 2, , N), C

L

C

L

i

ti

When should the object be moved: j

tj

i

ti

C

L

C

L

j

stj

i

sti

C

wL

C

wL

and

Where should the object be moved: jn }|min{ ikk

tk

j

tj Sn

C

L

C

Lwith

Action: ni moves loads to neighbors nodes nj selected from its neighbor node set },,,{

222 10 riiii nnnS

according to:

Performance Improvements Availability Improvement — Using path key replication to improv

e availability of semantic paths and distributed trie paths.

Duplicate a semantic object SO (a, R, b) by using key a and key b to publish it.

A path key of a semantic object contains the path information of the objects published before it on the same path. And A semantic object can be recovered from any latterly published semantic object on the same semantic path.

Experiment Results

An event-driven simulation environment

Simulation on a ring network with 200 and 2000 nodes.

Different distributions of object loads and node capacities are tested.

Experiment Results

Index Key Number

Key Type Average Key String Length

Internal Nodes Average Hops Moved Keys Splits

F Trie 13931 *.* 22.84 155158 22.84 0 0

P Trie 13913 *.* 22.84 43686 15.14 5980 43608

CP Trie 13913 *.* 22.84 7513 6.04 0 0

B-tree 13913 *.* 22.84 596 4 7696 592

B+-tree 13913 *.* 22.84 594 4 8850 590

F Trie 2349 *.pdf 55.06 100865 55.06 0 0

P Trie 2349 *.pdf 55.06 12338 17.41 900 12278

CP Trie 2349 *.pdf 55.06 1203 4.85 0 0

B-tree (26) 2349 *.pdf 55.06 137 3 1742 134

B-tree (3) 2349 *.pdf 55.06 1813 9 3608 1804

B+-tree (26) 2349 *.pdf 55.06 142 3 2085 139

B+-tree (3) 2349 *.pdf 55.06 1838 9 5487 1829

Trie index properties compared with B-tree and B+-tree. Compressed trie index has very short average depth.

Experiment ResultsThe size of a trie index is sensitive to only key string distribution. The independence to the network size and the number of keys make it scalable in large-scale and dynamic environment.

Experiment ResultsAverage search hops of a broadcast query for all the keys on the network using distributed trie indexes in network with different size and key number.

Experiment Results

An optimized search on trie indexes with 2349 PDF file names as keys

Index Physical nodes Messages without optimization

Messages in optimized search

Optimized avg hops

Average hops

F Trie 200 103076 23211 12.67 55.23

F Trie 2000 103076 98179 32.17 55.23

P Trie 200 12482 7986 8.09 17.36

P Trie 2000 12482 12340 13.047 17.36

CP Trie 200 1237 1092 4.89 4.89

CP Trie 2000 1237 1176 4.89 4.89

Experiment Results

Load balancing process show the variance of the system load decreasing with the load balancing iterations in different load distributions.

Experiment Results

Chord uses virtual servers to improve the load balance, where each physical node holds more than one virtual server and data objects are mapped by DHT function to virtual servers instead of physical nodes. They proposed that log N virtual servers per physical node can be optimal with high probability when considering only the number of keys.

Experiment ResultsLoad balancing process works effectively for distributed trie indexes that cause heavily imbalanced load distributions

Experiment Results

Experiment Results

If each extra hop incurred by the load balancing does not significantly delay a query, the average query latency under load balancing can be reduced when only considering storage consumption of objects.

Experiment Results

The availability of the full trie with the replication is better than that of the pruned trie because the pruned trie has much shorter path length and there are fewer copies in path key replication. The pruned trie however has better availability without replication, because it has much shorter search paths, i.e., it is less probably broken under the same failure distribution.

Conclusion

Publishing distributed indexes using semantic overlay methods can be a solution to support complex queries with high level semantics.

There are many conflicting factors that should be compromised when designing P2P system to achieve a scalable solution.

The distributed trie index can be scalable in large-scale and dynamic environments where keys string distribution is relatively stable.

Decentralized load balancing in large-scale and dynamic distributed systems can work effectively.

Future work still faces challenging in building more efficient distributed indexes, relieving hot spots on distributed indexes, improving availability while keeping system decentralized and scalable.

Future theoretic work should show that to what scale the trade-off can be made to achieve an acceptable scalability.

This work has been published in IEEE Transaction on Knowledge and Data Engineering

Questions and Comments

Thanks!

Full paper is available at IEEE Transactions on Knowledge and Data Engineering

imagine-p2p:a scalable p2p platform for the knowledge grid

Documents

scalable p2p platform

structured overlay network

local distributed systems

indexbased path queries

topology complexity

scalable wildcard

broadcasting queries

broadcast queries