implementing database coordination in p2p networks *

Implementing Database Coordination in P2P Networks *

Ilya Zaihrayeu

SemPGRID-04, 18 May 2004, New York, USA

* work with Fausto Giunchiglia

Why P2P Databases

• P2P data sharing: files … relational data?• File sharing: KaZaa + Morpheus = more than 460 million

downloads (download.com, May 2004)• P2P databases: academia testbeds so far..• Promises: large-scale fault-tolerant multi-database system

with low start-up and maintenance costs, and high “output” for an individual party

• Difficulties: data integration solutions are not applicable due to centralized nature

• Challenges: new methodologies, theories and algorithms, models, mechanisms and tools need to be developed

Why P2P Databases, cont’d

• Application: non performance critical domains, where local autonomy of each party is essential

• Medical care scenario– John is going for skiing and suffers an accident– John is taken to local clinic for treatment – doctors need to know whether

John has contraindication against some drugs– John does not know these details, but his database layer has a link to family

doctor’s databases• Cooperating real estate agents example

– Agents coordinate their data to push sales– When on the site of a customer who wants to sell, agent updates his

database and makes data available for other agents– When on the site of a customer who may want to buy, agent shows details

from his database, and may query other agent’s databases• Other examples: scientific databases (genomic data), tourism, etc

Data Coordination Model

• Interest Groups – group of peers able to answer queries about a certain topic– e.g., group topic – “Tourism in Trentino”, “Real Estate in Scotland”, etc– each Interest Group has group manager (GM) which helps in maintenance of

the group

• Acquaintances – “known” nodes that contribute data– acquaintance query – a query over the relations of an acquaintance which

results satisfy some local relation

• Correspondence Rules – solve heterogeneity problem at instance level– semantic heterogeneity at structure level is solved by acquaintance queries

• Coordination Rules – coordinate data (queries and updates) with acquaintances

Interest Groups

• Help to cope with large number of nodes by clustering the network

• Nodes self-organize into interest groups

• A node may form a child interest group

• One node may belong to multiple groups

• Use schema matching to monitor group constitution

• GM is to support group constitution, “talk” to other GMs and provide information about the group to newcomers

All topics

Arts Shopping

Movies Music… Publications Computers…

Lyrics Books

…

Acquaintance query

• Acquaintance query is a conjunctive query:• q(X) :- r1(X1), …, rn(Xn)

– q(X) – head, refers to local relation;– r1(X1), …, rn(Xn) – subgols of the body, refers to the relation of an

acquaintance; and comparison predicates– X, X1,…, Xn – variables or constants;

• E.g., P1: films (title, year, genre) :- P2: movie (title, year, director); genres (title, genre); year>1995

1 2

3

4

A B C D

E F

I G

I :- A,B

B :- C,D

D :- I,G

C :- E,F

F :- G

A loop

Correspondence Rules and Coordination Rules

• Correspondence rules define how constants from the local domain are translated into constants in the domain of an acquaintance (forward translation) and vice versa (backward translation)– not necessarily symmetric, e.g. currency translation

• Coordination Rules’ goal is data coordination with acquaintances and acquainted nodes– activated by user (user query) or from the network

(network query, results, update)

Algorithmic notes

• Query answering algorithm– Use acquaintance queries and correspondence rules to translate queries and

data– Propagate to acquaintances if acquaintance queries are relevant– Compute only new tuples, reconcile results– Process loops in query propagation, define termination point (no propagation

using acquaintance queries that have been already used)

• “Getting acquainted” protocol– Retrieve database schemas and then apply a matching operator on them– Based on the matching results, generate (with help of user) acquaintance

queries, correspondence rules, tune up coordination rules

• Updates handling (work with E. Franconi, G. Kuper, A. Lopatenko)– Data may go through a loop more than once, define termination point

Implementing P2P databases on top of JXTA

• Benefits– system platform, networking protocol independence– IP-independence (location independence)– gives basic blocks for building P2P applications

• We implement Interest Groups and Acquaintances in JXTA• We encode database related functionalities into a set of custom

JXTA services (DB-related services)

DB-related services

Node-level services Group-level services

Queries handler

DBoperations

… Screening service

GM service

…

Architecture

A node

PDBMS

User Interface (UI)

Database Manager (DBM)

Wrapper

Source Database (SDB)

User

A P2P database network

A P2P database network

User-1

User-2

User-n

Nodes on the

network

JXTA Layer

SS

Architecture, cont’d

JXTA Layer

DBM

User Interface (UI)

Wrapper

In

Out

Disco-very

Query Planner

Pip

es

Query Propagation

P2P Management

Coordination Rules Acquaintances

Peer Groups

Services

JXTA Core Services

GM in-pipe advDB-related services

Results Handler

Acquaintance queries

Correspondence Rules

Advertisements

Peer Adv

Peer Gr. Adv

Gr. topic

Pipe Adv

SS

Updates Handler

Demo: toy databases and topology

Relations:(1) Movie (title, year, genre)

(2) Credits (name, title, role)

(3) Movie2 (title, year, director)

(4) Genre (title, genre)

0

1

2

5

4

3

Q

[1,2]

[1,2]

[2]

[2,3,4]

[3]

[4]

(1:-1)

(2:-2)

(3:-3)

(4:-4)

(1:-3,4)

(2:-2)

(2:-2)

(4:-1)

Rendezvous peer

Mediator peer

Query example 1

“List titles of movies featuring Tom Hanks”

Q(t) :- Credits (n,t,r); n=“Tom Hanks”

0

1

2

5

4

3

Q

[1,2]

[1,2]

[2]

[2,3,4]

[3]

[4]

(1:-1)

(2:-2)

(2:-2)

(3:-3)

(4:-4)

(1:-3,4)

(2:-2)

(2:-2)

(4:-1)

Query example 2

“Titles of drama movies issued after 1995”

Q(t) :- Movie (t,y,g); g=“Drama”; y>1995;

0

1

2

5

4

3

Q

[1,2]

[1,2]

[2]

[2,3,4]

[3]

[4]

(1:-1)

(2:-2)

(3:-3)

(4:-4)

(1:-3,4)

(2:-2)

(2:-2)

(4:-1)

Query example 3

“Names of actors playing in action movies in 2003”

Q(n) :- Movie (t,y,g); Credits (n,t,r); r=“Actor”; g=“Action”; y=2003;

0

1

2

5

4

3

Q

[1,2]

[1,2]

[2]

[2,3,4]

[3]

[4]

(1:-1)

(2:-2)

(3:-3)

(4:-4)

(1:-3,4)

(2:-2)

(2:-2)

(4:-1)

References

• F. Giunchiglia and I. Zaihrayeu. Making peer databases interact - a vision for an architecture supporting data coordination. 6th International Workshop on Cooperative Information Agents (CIA-2002), Madrid, Spain, September 18 -20, 2002.

• P. Bernstein, F. Giunchiglia, A. Kementsietsidis, J. Mylopoulos, L. Serafini, and I. Zaihrayeu, “Data management for peer-to-peer computing: A vision,” WebDB, 2002.

• A. Halevy, Z. Ives, D. Suciu, and I. Tatarinov, “Schema mediation in a peer data management system,” ICDE, 2003.

• V. Kantere, I. Kiringa, J. Mylopoulos, A. Kementsietsidis, and M. Arenas, “Coordinating peer databases using ECA rules,” DBISP2P, September 2003.

• Enrico Franconi, Gabriel Kuper, Andrei Lopatenko, Ilya Zaihrayeu (2004). The coDB Robust Peer-to-Peer Database System. Proc. of the 2nd Workshop on Semantics in Peer-to-Peer and Grid Computing (SemPGrid'04), 2004

• JXTA project, see http://www.jxta.org

Announcement

Submission deadline: 30 June, 2004

www.p2pkm.org

Thank you

implementing database coordination in p2p networks *

Documents