peerdb : a p2p-based system for distributed data sharing db lab. m.s. 3 lee min young wee siong ng,...
TRANSCRIPT
PeerDB : A P2P-based System for Distributed Data Sharing
DB Lab.M.S. 3
LEE MIN YOUNG
Wee Siong Ng, Beng Chin Ooi, Kian-Lee Tan, Aoying Zhou,19th International Conference on Data Engineering, 2003
Overview
Introdution BestPeer PeerDB Implementation Performance Study Conclusion
Introduction
Background: P2P vs DDBS P2P DDBS
CLIENT CLIENT
CLIENT CLIENT
DBMS
CLIENT CLIENT
CLIENT CLIENT
Ad-hoc Membership Stable nodes
No global schema Schema Shared schema
Incomplete Query result set Complete
“Word of mouth” Content location Shared catalog
P2P DDBS
Limitation of existed P2P system Coarse- grained granularity (file level sharing) Limited in extensibility and flexibility
PeerDB Each node is a DBMS No global schema Incomplete answers possible Fine granularity, content searching
BestPeer
BestPeer A general purpose peer-to-peer platform
Key features: LIGLO server (Location-Independent Global Names
Lookup Server) Self-configurable neighborhood maintenance policy Agent-based service mechanism
Share the resource Data and information Storage power Computational power
BestPeer - LIGLO Server
BestPeer - Self-configurable Network
PeerDB Implementation
PeerDB - Architecture
Local Data(schema, keywords)
Database
Sharable Data
Data Management System : MySqlmeta-data stored in Local Dictionary and Export Dictionary
DBAgent : for mobile agentMaster agent that manages the queriesDispatches worker agents to neighboring nodes
PeerDB – no global schema Problems with supporting SQL queries:
No global schema information Different nodes could name the same table/attribute differently (“len”,
“length”) Solution
User supplies metadata for each relation name and attribute
P1
P4P2
P3
SeqIDLength
proteinSeq
Kinases
IDseqLen
ProteinKLen
SeqNoLen
sequence
Protein
Namechar
Protein
IDsequence
ProteinKSeq
Protein, kinases,lengthNumber, identifierLengthProtein, sequenceNumber, identifierSequence
ProteinKLenIDseqLenProteinKSeqIDsequence
Protein, humanKey, identifier, IDLengthSequence
KinasesSeqIDLengthproteinSeq
Local/Export Dictionary
Proteinnumber, identifierLengthSequence
ProteinSeqNoLensequence
Protein, kinasenameFeatures, functons
Proteinnamechar
Local Query Processing
Parse Query to Extract the list of relations and attributes name
Check Local Dictionary for matching relations Promising relations Return to the user
flood to all neighbors :query, IP, TTL Wait for responses
Display results to user as they arrive
User selects the relations he/she wants Create a “Data Retrieval Agent” Rewrite query in terms of new relations
If local, submit SQL to local db Contact remote nodes directly to access the data
Phase 1
Phase 2
Local Query Processing
P1
P4P2
P3
Protein, humanKey, identifier, IDLengthSequence
KinasesSeqIDLengthproteinSeq
Local Dictionary
P5
P6
User 1. query 2. query, IP, TTL
3. schema
4.Show shcemas
P1 : KinasesP2: ProteinP3: ProteinKLenP4. Protein
P1
1. SQL2. result
SeqIDLength
proteinSeq
Remote Query Processing
Find relations Relation Matching Agents flood with TTL Check Export Dictionary for a match
Return matches directly
Get data Data Retrieval Agent submits SQL to DBMS Return data to the requesting node directly Run further data processing before returning
Phase 1
Phase 2
Remote Query Processing
1. queryP1
P4
P2
P3
P5
P6
User 2. query, IP, TTL
3. sch
em
a
4.Show shcemas P1 : KinasesP2: ProteinP3: ProteinKLenP4. Protein
P21. SQL
2. result
ProteinSeqNoLensequence
Proteinnumber, identifierLengthSequence
Export Dictionary SeqNoLen
sequence
Protein
PeerDB
Statistics Reconfiguration based on the policy selected by the
user Relation information, number of answer objects
returned
Caches Cache all query results locally LRU replacement Users choose which copy they want
Performance Study
Performance Study
Rate of Returning Answers
Client/Server :Returns via the search path
PDMR :after each query, PDMR reconfigure itselt :reach out to more promising nodes directly
Performance Study
First search query % answers returned
Performance Study
Completion time vs Data size Communication Overhead
Conclusion
PeerDB Each node is a DBMS No global schema Incomplete answers possible Fine granularity, content searching