msc. thesis defense and dynamic community formation in p2p networks & performance of community...

Post on 07-Jun-2018

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Decentralized and Dynamic Community Formation in

P2P Networks & Performance of Community Based

Caching

Chepchumba S. Limo

May 6, 2015

1

Committee Members:

Anura Jayasumana (Advisor)

Liuiqing Yang

Christos Papadopoulos

MSc. Thesis Defense

Overview

• Introduction

• Motivation and Problem Statement

• Dynamic Group Discovery Algorithm

• Simulation Implementation

• Case Studies and Results

• Conclusion and Future Work

2

Introduction: Peer-to-Peer Networks

3

• Example of overlay network

• File transfer P2P networks

– Lookup/resource discovery

• P2P messaging

– PUT <key, value>

– GET(key) value• GET(key) node ID

• GET(key) data

• Mitigate resource discovery

– Distributed Hast Tables (DHTs)

– Caching Schemes

Key Value

123 Jack & Jill

A34 Avengers

BC5 Spiderman

24F Beyoncé

GET(123)

Jack & Jill

Introduction: Caching

4

• Distributed Hash Tables (DHT)

– Efficient

– Highly scalable

– Self organizing

• Caching

– Favor popular resources relative to entire network

– But traffic modeled by Zipf’sdistribution

Introduction: Caching

5

• Subset of nodes that share similar interests are said to

form a community

• Communities exist naturally

• Community Based Caching (CBC) algorithm proposed

– Exploits existence of communities when caching

– More nodes benefit from caching

Introduction: Communities

6

Structured P2P Network (Chord) Unstructured P2P Networks

Introduction: Communities

7

• In file sharing P2P networks:– Node_A interested in music and software

– Node_B interested in music and movies

– Node_A and Node_B form music community (community appears)

– Node_C with music and software interest joins network

– Node_C should join music community with Node_A and Node_B (community grows)

– Node_A, Node_B, Node_C leave network (community disappears)

• Properties of communities:1. Naturally occurring

2. Dynamic

3. Nodes/users can belong to multiple communities

4. Nodes/users can join/leave communities at-will

Overview

• Introduction

• Motivation and Problem Statement

• Dynamic Group Discovery Algorithm

• Simulation Implementation

• Case Studies and Results

• Conclusion and Future Work

8

Motivation

9

1. Community Based Caching algorithm (CBC) tested under the

limiting conditions, i.e.:

i. Static community assignment

ii. Nodes/users couldn’t change membership

iii. Nodes limited to being members of only 1 community

iv. Community membership based on websites queried (arguably weak

similarity)

Motivation

10

2. Limitation of existing dynamic community formation algorithms:i. Centralized node for maintenance

ii. Complicated computations

iii. Additional messaging to establish community membership

iv. Limited to being members of one community at a time

3. Basis to established similarities for community formationi. Website queried – weak

ii. Personal interests

iii. Acquired interests

Problem Statement

11

• Motivation summary:i. CBC tested under stringent conditions

ii. Existing algorithms have limitations

iii. Consider other basis of community formation

• Contribution– Decentralized community discovery algorithm

• Considers community properties i.e. naturally occurring, dynamic, members of multiple communities & join/leave communities at-will

• Overcomes limitations of existing algorithms

• Utilize already existing PUT and GET messages – no additional messaging needed

– Special key generation technique• Dissemination group information

– Test CBC under more realistic conditions• Network with churn

Overview

• Introduction

• Motivation and Problem Statement

• Dynamic Group Discovery Algorithm

• Simulation Implementation

• Case Studies and Results

• Conclusion and Future Work

12

Dynamic Group Discovery (DGD)

13

• Key has embedded meta-data on the type of resource it represents

• No additional security risk added if algorithm is publicly known

Dynamic Group Discovery (DGD)

14

• Key generation– Last 12 bits of the key needed for the three levels of group

identification:• Level 1 (mandatory): general classification e.g., music, movies etc.

• Level 2 (optional): specify geographical location e.g., U.S.A, Canada etc.

• Level 3 (optional): specify genre e.g., comedy, jazz etc.

Key Group ID Level 1 Group ID Level 2 Group ID Level 3 Final Key

0123456789abcdef music => 1 Canada => 2 hip-hop => 3 0123456789abcdef 123

a123456789bcdef0 music => 1 N/A=> 0 blues=> 9 a123456789bcdef0 109

b123456789bcdef0 movies => 3 USA => 3 comedy => 6 b123456789bcdef0 236

Dynamic Group Discovery (DGD)

15

• Built on top of structured P2P

– Guaranteed performance compared to unstructured

– Chord used

Dynamic Group Discovery (DGD)

16

• Goal of DGD is to allow community formation in structured

P2P networks

Dynamic Group Discovery (DGD)

17

• Establish group interest– Personal interests

• Two thresholds used:

– λ = for personal interests

– μ = for acquired interests

• λ << μ

void forward(key, msg, nextHop*)

{

if msg type = GET

{

extract group ID from key;

if group ID finger table and not pointing to me // is there interest

{

if hops < (𝑙𝑜𝑔2𝑁)/2{

set nextHop using entries in group ID finger table;

}

else

{

use chord to set nextHop;

}

}

else

{

keep track of specific GET message;

use chord to set nextHop;

if specific GET messages received λ OR μ times

send FIND GROUP request;

}

}

}

Group IDFinger To Group

Member

Frequency of

Use

120 node A N/A

239 node A N/A

912 node A N/A

122 node X 0

122 node Y 7

122 node Z 9

912 node X 20

235 node E 4

420 node G 1

Dynamic Group Discovery (DGD)

18

• Maintaining group ID finger table

– Limited resources

• σ = max number of fingers

Group IDFinger To Group

Member

Frequency of

Use

120 node A N/A

239 node A N/A

912 node A N/A

122 node X 0

122 node Y 7

122 node Z 9

912 node X 20

235 node E 4

420 node G 1

void handle_FINDGROUP_response(groupID, finger)

{

if <group ID, finger> pair already exist // done to avoid duplicates

return;

if group ID finger table is at capacity

{

if one least used finger can be identified

delete it;

else // i.e. multiple fingers with same low frequency use number

pick one at random and delete;

}

add new found finger;

}σ = 3

Overview

• Introduction

• Motivation and Problem Statement

• Dynamic Group Discovery Algorithm

• Simulation Implementation

• Case Studies and Results

• Conclusion and Future Work

19

Simulation Implementation

20

• Oversim

– Flexible network simulation framework

– Popular event driven simulator for P2P networks

• Keys and queries generated external to the simulator

– Able to indirectly control community size and symmetry

• Keys and queries generation:

1. Determine desired symmetry and size

2. Generate random keys with desired symmetry

3. Sort keys based level 1 identification

4. Assign Zipf’s α parameter per community

5. Generate queries

Simulation Implementation

21

void handle PUT event

{

key = read from key file;

if key is unspecified // i.e. all keys have been read from key file

{

schedule GET event;

}

else

{

extract group ID information from key;

add entry to group ID finger table;

create PUT message and send it out to network;

schedule next PUT event;

}

}

Simulation Implementation

22

void handle GET event

{

select group ID file to read from; // Based on personal interest

query = read from key from query file;

if query is unspecified // i.e. all keys have been read from query file

{

return; // do nothing

}

else

{

create GET message with query;

send out GET message;

schedule next GET event;

}

}

Simulation Implementation

23

void hand_REMOVE_request(group id, finger, maxHops, curHops)

{

if <group ID, finger> pair exist

delete entry

if curHops < maxHops

{

curHops ++;

forward message to all nodes in group ID finger table

}

}

Overview

• Introduction

• Motivation and Problem Statement

• Dynamic Group Discovery Algorithm

• Simulation Implementation

• Case Studies and Results

• Conclusion and Future Work

24

Case 1: Varying Number of Nodes

25

• Between 500 and 10,000 nodes

• 40,000 keys used with following distribution:

– 40% group 1

– 40% group 2

– 20% shared equally between group 3 to 9

• Maximum group ID finger table per node = 160

• λ = 2

• μ = 20

• σ = 3

Asymmetrical communities

Case 1: Varying Number of Nodes

26

Case 2: Varying Community Size

27

• 2,000 nodes

• 40,000 keys divided into 2 section i.e., 80% section 1 and 20% section 2

– Run 1: one community in section 1 and eight communities in section 2

– Run 2: two communities in section1 and seven communities in section 2

– Run 3: …

• Maximum group ID finger table per node = 160

• λ = 2

• μ = 20

• σ = 3

Case 2: Varying Community Size

28

Case 3: Introducing Churn

29

• Between 500 and 10,000 nodes

• 40,000 keys used with following distribution:

– 40% group 1

– 40% group 2

– 20% shared equally between group 3 to 9

• Maximum group ID finger table per node = 160

• λ = 2

• μ = 20

• σ = 3

Asymmetrical communities

Case 3: Introducing Churn

30

Overview

• Introduction

• Motivation and Problem Statement

• Dynamic Group Discovery Algorithm

• Simulation Implementation

• Case Studies and Results

• Conclusion and Future Work

31

Conclusion

32

1. Decentralized dynamic group discovery algorithm

2. Key generation with embedded group ID information

3. Improve lookup performance for queries resolved using cache data

• Stronger community basis i.e. personal and acquired interests

• Without churn

4. Easy implementation of dynamic group discovery • Utilize already existing messages

• Additional computation – extracting group ID information

5. Great potential for robust caching solution

Future Work

33

1. Optimize entries in group ID finger table

• Consider distance of new finger relative to other fingers in other tables

2. Consider location of next hop of group member

• Applicable for structured P2P networks

3. DGD need to know exactly how many nodes in network• if finger not pointing to me and hops < (𝑙𝑜𝑔2𝑁)/2

• Find solution to determine number of nodes in network with churn

4. Introduce churn in measurable manner

• Better characterize DGD’s performance

5. Test DGD with other type of P2P networks

Thank you!

• Dr. Jayasumana

• Liuiqing Yang

• Christos Papadopoulos

• Friend and co-workers at CSU and Dot Hill

• Family

34

QUESTIONS?

35

top related