msc. thesis defense and dynamic community formation in p2p networks & performance of community...

Decentralized and Dynamic Community Formation in

P2P Networks & Performance of Community Based

Caching

Chepchumba S. Limo

May 6, 2015

Committee Members:

Anura Jayasumana (Advisor)

Liuiqing Yang

Christos Papadopoulos

MSc. Thesis Defense

Overview

• Introduction

• Motivation and Problem Statement

• Dynamic Group Discovery Algorithm

• Simulation Implementation

• Case Studies and Results

• Conclusion and Future Work

Introduction: Peer-to-Peer Networks

• Example of overlay network

• File transfer P2P networks

– Lookup/resource discovery

• P2P messaging

– PUT <key, value>

– GET(key) value• GET(key) node ID

• GET(key) data

• Mitigate resource discovery

– Distributed Hast Tables (DHTs)

– Caching Schemes

Key Value

123 Jack & Jill

A34 Avengers

BC5 Spiderman

24F Beyoncé

GET(123)

Jack & Jill

Introduction: Caching

• Distributed Hash Tables (DHT)

– Efficient

– Highly scalable

– Self organizing

• Caching

– Favor popular resources relative to entire network

– But traffic modeled by Zipf’sdistribution

Introduction: Caching

• Subset of nodes that share similar interests are said to

form a community

• Communities exist naturally

• Community Based Caching (CBC) algorithm proposed

– Exploits existence of communities when caching

– More nodes benefit from caching

Introduction: Communities

Structured P2P Network (Chord) Unstructured P2P Networks

Introduction: Communities

• In file sharing P2P networks:– Node_A interested in music and software

– Node_B interested in music and movies

– Node_A and Node_B form music community (community appears)

– Node_C with music and software interest joins network

– Node_C should join music community with Node_A and Node_B (community grows)

– Node_A, Node_B, Node_C leave network (community disappears)

• Properties of communities:1. Naturally occurring

2. Dynamic

3. Nodes/users can belong to multiple communities

4. Nodes/users can join/leave communities at-will

Overview

• Introduction

Motivation

1. Community Based Caching algorithm (CBC) tested under the

limiting conditions, i.e.:

i. Static community assignment

ii. Nodes/users couldn’t change membership

iii. Nodes limited to being members of only 1 community

iv. Community membership based on websites queried (arguably weak

similarity)

Motivation

2. Limitation of existing dynamic community formation algorithms:i. Centralized node for maintenance

ii. Complicated computations

iii. Additional messaging to establish community membership

iv. Limited to being members of one community at a time

3. Basis to established similarities for community formationi. Website queried – weak

ii. Personal interests

iii. Acquired interests

Problem Statement

• Motivation summary:i. CBC tested under stringent conditions

ii. Existing algorithms have limitations

iii. Consider other basis of community formation

• Contribution– Decentralized community discovery algorithm

• Considers community properties i.e. naturally occurring, dynamic, members of multiple communities & join/leave communities at-will

• Overcomes limitations of existing algorithms

• Utilize already existing PUT and GET messages – no additional messaging needed

– Special key generation technique• Dissemination group information

– Test CBC under more realistic conditions• Network with churn

Overview

• Introduction

Dynamic Group Discovery (DGD)

• Key has embedded meta-data on the type of resource it represents

• No additional security risk added if algorithm is publicly known

• Key generation– Last 12 bits of the key needed for the three levels of group

identification:• Level 1 (mandatory): general classification e.g., music, movies etc.

• Level 2 (optional): specify geographical location e.g., U.S.A, Canada etc.

• Level 3 (optional): specify genre e.g., comedy, jazz etc.

Key Group ID Level 1 Group ID Level 2 Group ID Level 3 Final Key

0123456789abcdef music => 1 Canada => 2 hip-hop => 3 0123456789abcdef 123

a123456789bcdef0 music => 1 N/A=> 0 blues=> 9 a123456789bcdef0 109

b123456789bcdef0 movies => 3 USA => 3 comedy => 6 b123456789bcdef0 236

• Built on top of structured P2P

– Guaranteed performance compared to unstructured

– Chord used

• Goal of DGD is to allow community formation in structured

P2P networks

• Establish group interest– Personal interests

• Two thresholds used:

– λ = for personal interests

– μ = for acquired interests

• λ << μ

void forward(key, msg, nextHop*)

if msg type = GET

extract group ID from key;

if group ID finger table and not pointing to me // is there interest

if hops < (𝑙𝑜𝑔2𝑁)/2{

set nextHop using entries in group ID finger table;

use chord to set nextHop;

keep track of specific GET message;

use chord to set nextHop;

if specific GET messages received λ OR μ times

send FIND GROUP request;

Group IDFinger To Group

Member

Frequency of

120 node A N/A

239 node A N/A

912 node A N/A

122 node X 0

122 node Y 7

122 node Z 9

912 node X 20

235 node E 4

420 node G 1

• Maintaining group ID finger table

– Limited resources

• σ = max number of fingers

Group IDFinger To Group

Member

Frequency of

120 node A N/A

239 node A N/A

912 node A N/A

122 node X 0

122 node Y 7

122 node Z 9

912 node X 20

235 node E 4

420 node G 1

void handle_FINDGROUP_response(groupID, finger)

if <group ID, finger> pair already exist // done to avoid duplicates

return;

if group ID finger table is at capacity

if one least used finger can be identified

delete it;

else // i.e. multiple fingers with same low frequency use number

pick one at random and delete;

add new found finger;

}σ = 3

Overview

• Introduction

Simulation Implementation

• Oversim

– Flexible network simulation framework

– Popular event driven simulator for P2P networks

• Keys and queries generated external to the simulator

– Able to indirectly control community size and symmetry

• Keys and queries generation:

1. Determine desired symmetry and size

2. Generate random keys with desired symmetry

3. Sort keys based level 1 identification

4. Assign Zipf’s α parameter per community

5. Generate queries

void handle PUT event

key = read from key file;

if key is unspecified // i.e. all keys have been read from key file

schedule GET event;

extract group ID information from key;

add entry to group ID finger table;

create PUT message and send it out to network;

schedule next PUT event;

void handle GET event

select group ID file to read from; // Based on personal interest

query = read from key from query file;

if query is unspecified // i.e. all keys have been read from query file

return; // do nothing

create GET message with query;

send out GET message;

schedule next GET event;

void hand_REMOVE_request(group id, finger, maxHops, curHops)

if <group ID, finger> pair exist

delete entry

if curHops < maxHops

curHops ++;

forward message to all nodes in group ID finger table

Overview

• Introduction

Case 1: Varying Number of Nodes

• Between 500 and 10,000 nodes

• 40,000 keys used with following distribution:

– 40% group 1

– 40% group 2

– 20% shared equally between group 3 to 9

• Maximum group ID finger table per node = 160

• λ = 2

• μ = 20

• σ = 3

Asymmetrical communities

Case 1: Varying Number of Nodes

Case 2: Varying Community Size

• 2,000 nodes

• 40,000 keys divided into 2 section i.e., 80% section 1 and 20% section 2

– Run 1: one community in section 1 and eight communities in section 2

– Run 2: two communities in section1 and seven communities in section 2

– Run 3: …

• λ = 2

• μ = 20

• σ = 3

Case 2: Varying Community Size

Case 3: Introducing Churn

• Between 500 and 10,000 nodes

• 40,000 keys used with following distribution:

– 40% group 1

– 40% group 2

– 20% shared equally between group 3 to 9

• λ = 2

• μ = 20

• σ = 3

Asymmetrical communities

Case 3: Introducing Churn

Overview

• Introduction

Conclusion

1. Decentralized dynamic group discovery algorithm

2. Key generation with embedded group ID information

3. Improve lookup performance for queries resolved using cache data

• Stronger community basis i.e. personal and acquired interests

• Without churn

4. Easy implementation of dynamic group discovery • Utilize already existing messages

• Additional computation – extracting group ID information

5. Great potential for robust caching solution

Future Work

1. Optimize entries in group ID finger table

• Consider distance of new finger relative to other fingers in other tables

2. Consider location of next hop of group member

• Applicable for structured P2P networks

3. DGD need to know exactly how many nodes in network• if finger not pointing to me and hops < (𝑙𝑜𝑔2𝑁)/2

• Find solution to determine number of nodes in network with churn

4. Introduce churn in measurable manner

• Better characterize DGD’s performance

5. Test DGD with other type of P2P networks

Thank you!

• Dr. Jayasumana

• Liuiqing Yang

• Christos Papadopoulos

• Friend and co-workers at CSU and Dot Hill

• Family

QUESTIONS?

msc. thesis defense and dynamic community formation in p2p networks & performance of community...

Documents

city of hoboken resolution no. · 2012. 4. 4. · orlando...

chicago limo bus

limo service oakland

dorchester limo service

miami airport limo

american limo chicago

airport limo melbourne

dardi limo

limo service chicago

austin limo – sxsw car services by austin choice limo

limo rentals chicago

hire cheap limo

limo comparison

presentatie celebration limo

va limo service

limo in chicago

chicago limo

intro to caching,caching algorithms and caching frameworks

overview introduction to asp.net caching output caching ...

limo anywhere :: how to grow your limo business