Download - Answering Tree Pattern Queries Using Views
Answering Tree Pattern Queries Using Views
Laks V.S. Lakshmanan, Hui (Wendy) Wang, and Zheng (Jessica) Zhao
University of British ColumbiaVancouver, BC
Amazon.com
Outline
Motivation Problems Studied Without schema With schema Recursive schemas Related Work Summary & Future Work
Motivation 1/3
Integration of existing data sources. Local as view (LAV) – one of the well-
known approaches. Each source = a materialized view over
some global database. Answer to query over global DB =
answer to query using (materialized) views.
Motivation 2/3 <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
Source = View “//Trials//Trial” over some DB containing clinical data – trials, their status, patient data, etc.
Consider query Q: //Trials[//Status]//Trial over [unknown] original DB.
How can and should we answer it using above source?
Motivation 3/3<PharmaLab> (1) <Trials @type=“T1”> (2) <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> </Trials> <Trials @type=“T2”> (13) <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> </Trials> </PharmaLab>
<Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
//Trials//Trial//Trials//Trial?? ??
On
e p
ossib
le o
rig
inal D
B
Motivation 3/3<PharmaLab> (1) <Trials @type=“T1”> (2) <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> …
</Trial> </Trials> <Trials @type=“T2”> (13) <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> </Trials> </PharmaLab>
<Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
//Trials//Trial//Trials//Trial?? ??
On
e p
ossib
le o
rig
inal D
BQ: //Trials[//Status]//Trial
Motivation 3/3<PharmaLab> (1) <Trials @type=“T1”> (2) <Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> </Trials> <Trials @type=“T2”> (13) <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial> </Trials> </PharmaLab>
<Trial> (3) <Patient> (4) John Doe </Patient> … <Status> (10) Complete </Status> </Trial> <Trial> (11) <Patient> (12) Jen Bloe </Patient> … </Trial> <Trial> (14) <Patient> (15) Mary Moore </Patient> … </Trial>
//Trials//Trial//Trials//Trial
On
e p
ossib
le o
rig
inal D
B
◦ “●◦ “●[//Status]”[//Status]” { (3) }
Contained rewriting
Problems Studied 1/3 Equivalent Rewriting: Given Q and
views V, find an equivalent rewriting of Q using V, i.e., an expression E s.t. V◦E Ξ Q, over all possible input DBs. Appropriate for query optimization.
Contained Rewriting: Given Q and V, find an expression E s.t. V◦E Q overall all possible input DBs, and V◦E is maximal among all such rewritings. Most appropriate for information integration
[Halevy, Lenzerini, Pottinger & Halevy].
Problems Studied 2/3No Schema: Given Q and V, find a
maximally contained rewriting (MCR) of Q using V.
With Schema: Given Q and V, and a schema prescribing possible input DBs, find a maximally contained rewriting of Q using V.
Focus: Tree Pattern Queries (XP/,//, [ ]). Schema without cycles, union, and
recursion.
Problems Studied 3/3
Given Q & V: R Ξ V ◦ E Q.
Compensation queryRewriting query
Want MCR in the absence and in the presence of a schema.
//a[//b]/c //a
b c
Without Schema 1/6
Question 1: Does an MCR always exist?
/a/a
bb
cc
VV
/b/b
d
Q1Q1
/a/a
b
dd
Q2Q2
No MCR for Q1 and for Q2.
What went wrong?
distinguished (answer) node
Without Schema 2/6
Trial
//Trials//Trials //Trials//Trials
StatusStatus Patient
(1)(1)
(2)(2) (3)(3)Trial
//Trials//Trials
StatusStatus Patient
(1)(1)
(2)(2) (3)(3)
Unfulfilled obligationsClip Away Tree (CAT)
f
f – useful embedding
VVQQ
V
E
Without Schema 3/6Theorem: Q, V – tree pattern queries.
Then Q is answerable using V iff there is a useful embedding from Q to V.
aa
dd
baa
//a//a
c c
bba a aa
b
cc
2
1
77
33
44
55V ee
bb
ccQ
1,21,2
1:{2}, 2:{}1:{2}, 2:{}
2:{6}2:{6}
6:{7}6:{7}
1:{2,3}, 2:{3}1:{2,3}, 2:{3}
2:{6}, 3:{4} 2:{6}, 3:{4}
4:{5}, 6:{7} 4:{5}, 6:{7}
Testing Existence of MCR:
//a//a
6
Without Schema 4/6
Two embeddings – corresponding irredundant CRs.
aa
baa
cb
cc
//a//a
d e
aa
baa
cb
cc
//a//a
a eb
c d
need for expressing
MCR!
Without Schema 5/6Can test existence of MCR in poly time. However, MCRs can be exponentially
large (closure issue).
eedd
aa aa
c
bb
//a//a
aa
//a//a
c
bb
c
bbVVQQ
How many irredundant CRs
are possible?
Without Schema 5/6
eedd
aa aa
c
bb
//a//a
aa
//a//a
c
bb
c
bbVV
QQc
bb
aa
//a//a
d e
Without Schema 5/6
eedd
aa aa
c
bb
//a//a
aa
//a//a
c
bb
c
bbVV
QQc
bb
aa
//a//a
d
a/b/c/e
Without Schema 5/6
eedd
aa aa
c
bb
//a//a
aa
//a//a
c
bb
c
bbVV
QQc
bb
aa
//a//a
ea/b
e
c
Without Schema 5/6
eedd
aa aa
c
bb
//a//a
aa
//a//a
c
bb
c
bbVV
QQc
bb
aa
//a//a
a/b/c/ea/b
e
cMCR = union of exponential
# CRs in the worst case!
Without Schema 6/6
Summary: Can test existence of MCR in poly
time. Exact characterization.
MCR may be union of exponentially many CRs in the worst case.
Algorithm for generating MCR.
With Schema 1/6
Given Query Q, view V, schema S. Infer all constraints C implied by S. Chase V w.r.t. C. Look for MCR of Q w.r.t. chased view.
With Schema 2/6AuctionsAuctions
AuctionAuction
open_auctionopen_auctionclosed_auctionclosed_auction
bidsbids
personperson itemitem
namename
**
** ??
++ ??
++ ++
E.g. constraints:•c_a has ≤ 1 bids
child•Every Auction
having a person desc also
has an item desc.
•every path from Auction to name
goes via bids.
With Schema 3/6
//Auction//Auction
o_ao_a c_ac_a
bidsbids bidsbids
VV
//Auction//Auction
bidsbids bidsbids
personperson itemitem
namenameQQ
With Schema 3/6AuctionsAuctions
AuctionAuction
open_auctionopen_auction closed_auctionclosed_auction
bidsbids
personperson itemitem
namename
**
** ??
++ ??
++ ++
o_ao_a c_ac_a
bidsbids bidsbids
//Auction//Auction
person item
name
p i
n
With Schema 4/6
o_ao_a c_ac_a
bidsbids bidsbids
//Auction//Auction
person item
name
p i
n
//Auction//Auction
bidsbids bidsbids
personperson itemitem
namenameQQ
MCR = identity query.
With Schema 5/6Another Example: AuctionsAuctions
AuctionAuction
closed_auctionclosed_auction
bidsbids
personperson
itemitem
namename
**
** ??
++
++
open_auctionopen_auction
buyer?
//Auction
nameitem
person
Q
//Auction
VHow to answer Q using V?
With Schema 5/6 Another Example: AuctionsAuctions
AuctionAuction
closed_auctionclosed_auction
bidsbids
personperson
itemitem
name
**
** ??
++
++
open_auctionopen_auction
buyer?
//Auction
nameitem
person
Q
//Auction
namnamee
item
So what’s the compensation query?
With Schema 5/6 Another Example: AuctionsAuctions
AuctionAuction
closed_auctionclosed_auction
bidsbids
personperson
itemitem
name
**
** ??
++
++
open_auctionopen_auction
buyer?
//Auction
nameitem
person
Q
//Auction
namename
item
MCR = V ◦ “●//name”
With Schema 6/6
Challenges and Highlights: Naïve chase can explode.
Make chase context aware.
Exact characterization of schema w/o recursion and union in terms of constraints.
Efficient algo. for inferring the constraints. Efficient algo. for chase. And for finding MCR. MCR is unique, if it exists.
Recursive Schemas 1/2a
b
c d**
?
//a
b
V
//a
b b
c dQ
What is the MCR?
Recursive Schemas 2/2a
b
c d**
?
//a
b
V
//a
b b
c dQ//a
b
c d
Recursive Schemas 2/2a
b
c d**
?
//a
b
V
//a
b b
c dQ//a
b
c
d
b
Recursive Schemas 2/2a
b
c d**
?
//a
b
V
//a
b b
c dQ//a
b
c
db
Recursive Schemas 2/2a
b
c d**
?
//a
b
V
//a
b b
c dQ//a
b
c
b
d
b
MCR = union of four CRs.
Behavior similar to no schema.
Related Work 1/2
QAV for relational – huge body of work [Halevy 01].
Regular path queries and semi-structured DBs [Grahne&Thomo 03, Calvenese 00,Papakonstantinou&Vassalos 99].
Equivalent rewrites for fragments of XQuery and XPath [Deutsch&Tannen 03, Tang&Zhou 05, Xu&Ozsoyoglu 05].
Related Work 2/2 Key differences b/w equivalent &
contained rewriting: Unique rewriting (even w/o schema). MCR may involve union of (possibly
exponentially many) CRs. Study of contained rewriting in
presence of schema. Lot of work on semantic caching
[Chen+ 02], heuristics for using materialized views for optimizing XPath [Balmin+ 04], mine views worth materializing, XPath containment, … .
Summary & Future Work 1/2
QAV using (maximally) contained rewriting ( information integration).
Without schema: existence, characterization, closure, generation of MCR.
With Schema: extract essence using constraints, chase, similar problems as above.
Impact of recursion. Experiments.
Summary & Future Work 2/2
Impact of wildcard, disjunction, order …
Impact of union, recursion, … Other integration models (e.g., GLAV) QAV for XQuery.