implementing database coordination in p2p networks *
DESCRIPTION
Implementing Database Coordination in P2P Networks *. Ilya Zaihrayeu. SemPGRID-04, 18 May 2004, New York, USA. * work with Fausto Giunchiglia. Why P2P Databases. P2P data sharing: files … relational data? - PowerPoint PPT PresentationTRANSCRIPT
Implementing Database Coordination in P2P Networks *
Ilya Zaihrayeu
SemPGRID-04, 18 May 2004, New York, USA
* work with Fausto Giunchiglia
Why P2P Databases
• P2P data sharing: files … relational data?• File sharing: KaZaa + Morpheus = more than 460 million
downloads (download.com, May 2004)• P2P databases: academia testbeds so far..• Promises: large-scale fault-tolerant multi-database system
with low start-up and maintenance costs, and high “output” for an individual party
• Difficulties: data integration solutions are not applicable due to centralized nature
• Challenges: new methodologies, theories and algorithms, models, mechanisms and tools need to be developed
Why P2P Databases, cont’d
• Application: non performance critical domains, where local autonomy of each party is essential
• Medical care scenario– John is going for skiing and suffers an accident– John is taken to local clinic for treatment – doctors need to know whether
John has contraindication against some drugs– John does not know these details, but his database layer has a link to family
doctor’s databases• Cooperating real estate agents example
– Agents coordinate their data to push sales– When on the site of a customer who wants to sell, agent updates his
database and makes data available for other agents– When on the site of a customer who may want to buy, agent shows details
from his database, and may query other agent’s databases• Other examples: scientific databases (genomic data), tourism, etc
Data Coordination Model
• Interest Groups – group of peers able to answer queries about a certain topic– e.g., group topic – “Tourism in Trentino”, “Real Estate in Scotland”, etc– each Interest Group has group manager (GM) which helps in maintenance of
the group
• Acquaintances – “known” nodes that contribute data– acquaintance query – a query over the relations of an acquaintance which
results satisfy some local relation
• Correspondence Rules – solve heterogeneity problem at instance level– semantic heterogeneity at structure level is solved by acquaintance queries
• Coordination Rules – coordinate data (queries and updates) with acquaintances
Interest Groups
• Help to cope with large number of nodes by clustering the network
• Nodes self-organize into interest groups
• A node may form a child interest group
• One node may belong to multiple groups
• Use schema matching to monitor group constitution
• GM is to support group constitution, “talk” to other GMs and provide information about the group to newcomers
All topics
Arts Shopping
Movies Music… Publications Computers…
Lyrics Books
…
Acquaintance query
• Acquaintance query is a conjunctive query:• q(X) :- r1(X1), …, rn(Xn)
– q(X) – head, refers to local relation;– r1(X1), …, rn(Xn) – subgols of the body, refers to the relation of an
acquaintance; and comparison predicates– X, X1,…, Xn – variables or constants;
• E.g., P1: films (title, year, genre) :- P2: movie (title, year, director); genres (title, genre); year>1995
1 2
3
4
A B C D
E F
I G
I :- A,B
B :- C,D
D :- I,G
C :- E,F
F :- G
A loop
Correspondence Rules and Coordination Rules
• Correspondence rules define how constants from the local domain are translated into constants in the domain of an acquaintance (forward translation) and vice versa (backward translation)– not necessarily symmetric, e.g. currency translation
• Coordination Rules’ goal is data coordination with acquaintances and acquainted nodes– activated by user (user query) or from the network
(network query, results, update)
Algorithmic notes
• Query answering algorithm– Use acquaintance queries and correspondence rules to translate queries and
data– Propagate to acquaintances if acquaintance queries are relevant– Compute only new tuples, reconcile results– Process loops in query propagation, define termination point (no propagation
using acquaintance queries that have been already used)
• “Getting acquainted” protocol– Retrieve database schemas and then apply a matching operator on them– Based on the matching results, generate (with help of user) acquaintance
queries, correspondence rules, tune up coordination rules
• Updates handling (work with E. Franconi, G. Kuper, A. Lopatenko)– Data may go through a loop more than once, define termination point
Implementing P2P databases on top of JXTA
• Benefits– system platform, networking protocol independence– IP-independence (location independence)– gives basic blocks for building P2P applications
• We implement Interest Groups and Acquaintances in JXTA• We encode database related functionalities into a set of custom
JXTA services (DB-related services)
DB-related services
Node-level services Group-level services
Queries handler
DBoperations
… Screening service
GM service
…
Architecture
A node
PDBMS
User Interface (UI)
Database Manager (DBM)
Wrapper
Source Database (SDB)
User
A P2P database network
A P2P database network
User-1
User-2
User-n
Nodes on the
network
JXTA Layer
SS
Architecture, cont’d
JXTA Layer
DBM
User Interface (UI)
Wrapper
In
Out
Disco-very
Query Planner
Pip
es
Query Propagation
P2P Management
Coordination Rules Acquaintances
Peer Groups
Services
JXTA Core Services
GM in-pipe advDB-related services
Results Handler
Acquaintance queries
Correspondence Rules
Advertisements
Peer Adv
Peer Gr. Adv
Gr. topic
Pipe Adv
SS
Updates Handler
Demo: toy databases and topology
Relations:(1) Movie (title, year, genre)
(2) Credits (name, title, role)
(3) Movie2 (title, year, director)
(4) Genre (title, genre)
0
1
2
5
4
3
Q
[1,2]
[1,2]
[2]
[2,3,4]
[3]
[4]
(1:-1)
(2:-2)
(3:-3)
(4:-4)
(1:-3,4)
(2:-2)
(2:-2)
(4:-1)
Rendezvous peer
Mediator peer
Query example 1
“List titles of movies featuring Tom Hanks”
Q(t) :- Credits (n,t,r); n=“Tom Hanks”
0
1
2
5
4
3
Q
[1,2]
[1,2]
[2]
[2,3,4]
[3]
[4]
(1:-1)
(2:-2)
(2:-2)
(3:-3)
(4:-4)
(1:-3,4)
(2:-2)
(2:-2)
(4:-1)
Query example 2
“Titles of drama movies issued after 1995”
Q(t) :- Movie (t,y,g); g=“Drama”; y>1995;
0
1
2
5
4
3
Q
[1,2]
[1,2]
[2]
[2,3,4]
[3]
[4]
(1:-1)
(2:-2)
(3:-3)
(4:-4)
(1:-3,4)
(2:-2)
(2:-2)
(4:-1)
Query example 3
“Names of actors playing in action movies in 2003”
Q(n) :- Movie (t,y,g); Credits (n,t,r); r=“Actor”; g=“Action”; y=2003;
0
1
2
5
4
3
Q
[1,2]
[1,2]
[2]
[2,3,4]
[3]
[4]
(1:-1)
(2:-2)
(3:-3)
(4:-4)
(1:-3,4)
(2:-2)
(2:-2)
(4:-1)
References
• F. Giunchiglia and I. Zaihrayeu. Making peer databases interact - a vision for an architecture supporting data coordination. 6th International Workshop on Cooperative Information Agents (CIA-2002), Madrid, Spain, September 18 -20, 2002.
• P. Bernstein, F. Giunchiglia, A. Kementsietsidis, J. Mylopoulos, L. Serafini, and I. Zaihrayeu, “Data management for peer-to-peer computing: A vision,” WebDB, 2002.
• A. Halevy, Z. Ives, D. Suciu, and I. Tatarinov, “Schema mediation in a peer data management system,” ICDE, 2003.
• V. Kantere, I. Kiringa, J. Mylopoulos, A. Kementsietsidis, and M. Arenas, “Coordinating peer databases using ECA rules,” DBISP2P, September 2003.
• Enrico Franconi, Gabriel Kuper, Andrei Lopatenko, Ilya Zaihrayeu (2004). The coDB Robust Peer-to-Peer Database System. Proc. of the 2nd Workshop on Semantics in Peer-to-Peer and Grid Computing (SemPGrid'04), 2004
• JXTA project, see http://www.jxta.org
Announcement
Submission deadline: 30 June, 2004
www.p2pkm.org
Thank you