peerdb : a p2p-based system for distributed data sharing db lab. m.s. 3 lee min young wee siong ng,...

PeerDB : A P2P-based System for Distributed Data Sharing

DB Lab.M.S. 3

LEE MIN YOUNG

Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, Aoying Zhou,19th International Conference on Data Engineering, 2003

Overview

Introdution BestPeer PeerDB Implementation Performance Study Conclusion

Introduction

Background: P2P vs DDBS P2P DDBS

CLIENT CLIENT

CLIENT CLIENT

DBMS

CLIENT CLIENT

CLIENT CLIENT

Ad-hoc Membership Stable nodes

No global schema Schema Shared schema

Incomplete Query result set Complete

“Word of mouth” Content location Shared catalog

P2P DDBS

Limitation of existed P2P system Coarse- grained granularity (file level sharing) Limited in extensibility and flexibility

PeerDB Each node is a DBMS No global schema Incomplete answers possible Fine granularity, content searching

BestPeer

BestPeer A general purpose peer-to-peer platform

Key features: LIGLO server (Location-Independent Global Names

Lookup Server) Self-configurable neighborhood maintenance policy Agent-based service mechanism

Share the resource Data and information Storage power Computational power

BestPeer - LIGLO Server

BestPeer - Self-configurable Network

PeerDB Implementation

PeerDB - Architecture

Local Data(schema, keywords)

Database

Sharable Data

Data Management System : MySqlmeta-data stored in Local Dictionary and Export Dictionary

DBAgent : for mobile agentMaster agent that manages the queriesDispatches worker agents to neighboring nodes

PeerDB – no global schema Problems with supporting SQL queries:

No global schema information Different nodes could name the same table/attribute differently (“len”,

“length”) Solution

User supplies metadata for each relation name and attribute

P1

P4P2

P3

SeqIDLength

proteinSeq

Kinases

IDseqLen

ProteinKLen

SeqNoLen

sequence

Protein

Namechar

Protein

IDsequence

ProteinKSeq

Protein, kinases,lengthNumber, identifierLengthProtein, sequenceNumber, identifierSequence

ProteinKLenIDseqLenProteinKSeqIDsequence

Protein, humanKey, identifier, IDLengthSequence

KinasesSeqIDLengthproteinSeq

Local/Export Dictionary

Proteinnumber, identifierLengthSequence

ProteinSeqNoLensequence

Protein, kinasenameFeatures, functons

Proteinnamechar

Local Query Processing

Parse Query to Extract the list of relations and attributes name

Check Local Dictionary for matching relations Promising relations Return to the user

flood to all neighbors :query, IP, TTL Wait for responses

Display results to user as they arrive

User selects the relations he/she wants Create a “Data Retrieval Agent” Rewrite query in terms of new relations

If local, submit SQL to local db Contact remote nodes directly to access the data

Phase 1

Phase 2

Local Query Processing

P1

P4P2

P3

Protein, humanKey, identifier, IDLengthSequence

KinasesSeqIDLengthproteinSeq

Local Dictionary

P5

P6

User 1. query 2. query, IP, TTL

3. schema

4.Show shcemas

P1 : KinasesP2: ProteinP3: ProteinKLenP4. Protein

P1

1. SQL2. result

SeqIDLength

proteinSeq

Remote Query Processing

Find relations Relation Matching Agents flood with TTL Check Export Dictionary for a match

Return matches directly

Get data Data Retrieval Agent submits SQL to DBMS Return data to the requesting node directly Run further data processing before returning

Phase 1

Phase 2

Remote Query Processing

1. queryP1

P4

P2

P3

P5

P6

User 2. query, IP, TTL

3. sch

em

a

4.Show shcemas P1 : KinasesP2: ProteinP3: ProteinKLenP4. Protein

P21. SQL

2. result

ProteinSeqNoLensequence

Proteinnumber, identifierLengthSequence

Export Dictionary SeqNoLen

sequence

Protein

PeerDB

Statistics Reconfiguration based on the policy selected by the

user Relation information, number of answer objects

returned

Caches Cache all query results locally LRU replacement Users choose which copy they want

Performance Study

Performance Study

Rate of Returning Answers

Client/Server :Returns via the search path

PDMR :after each query, PDMR reconfigure itselt :reach out to more promising nodes directly

Performance Study

First search query % answers returned

Performance Study

Completion time vs Data size Communication Overhead

Conclusion

PeerDB Each node is a DBMS No global schema Incomplete answers possible Fine granularity, content searching

peerdb : a p2p-based system for distributed data sharing db lab. m.s. 3 lee min young wee siong ng,...

Documents

remote query processing1

local dictionary

data processing

dbmsreturn data

data engineering

global schemaproblems

local dbcontact remote

distributed data sharingdb