![Page 1: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/1.jpg)
XPathLearner: An On-Line Self-Tuning Markov Histogram for XML Path Selectivity Estimation
Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey Scott Vitter, Ronald Parr
Speaker: Ho Wai Shing
![Page 2: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/2.jpg)
Contents Introduction: the problems in XML
path selectivity estimation XPathLearner: the properties and
the details Experiment Results Conclusions Future Work
![Page 3: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/3.jpg)
Introduction XML is becoming the standard of
data exchange We need to query the structure
and text data of XML documents Selectivity is essential in
optimizing evaluation plans
![Page 4: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/4.jpg)
Introduction Example:
![Page 5: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/5.jpg)
Introduction Example:
FOR $b IN document("*")//bookWHERE $b/publisher = "Morgan Kaufmann" AND $b/year = "1998"RETURN $b/title
The path expressions:
//book/publisher = "Morgan Kaufmann"//book/year = "1998"//book/title
![Page 6: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/6.jpg)
Introduction We need a structure to store some
statistics of the data Then calculate the estimated
selectivity from these statistics Problem: estimate the selectivity
of (simple, single-value, multi-value) path expressions with limited space
![Page 7: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/7.jpg)
Related Work Path Trees Markov Tables k-RO (in Lore)
![Page 8: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/8.jpg)
Path Trees Aggregate siblings with the same tag tag names only (no data values) e.g.,
![Page 9: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/9.jpg)
Markov Table selectivity of short paths up to
length k is stored selectivity of longer paths are
estimated using a Markov model e.g., //DBLP 1 //article/author 1
//DBLP/book 2 //year 3
//author 4 //article/year 1
//book/author 3 //book/year 2
![Page 10: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/10.jpg)
k-RO used in Lore systems very similar to Markov table data values are also objects stored as a graph
![Page 11: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/11.jpg)
Twigs can answer "twig" queries
a structural query with a small branch based on suffix tree (for simple
paths) + signatures in each node (for estimating branching)
![Page 12: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/12.jpg)
Problems Faced Offline
need to scan the whole repository beforehand to gather statistics
unfeasible if the data is remote and is extremely large
Can solve SPEs only or it's too large
Ignore data values
![Page 13: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/13.jpg)
Problems Faced Not Adaptive to query workload
much space wasted in infrequently asked paths
No Quick Update needs periodic rescan of repository
![Page 14: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/14.jpg)
Objective XPathLearner:
uses Markov based approach, uses an online algorithm, is adaptive to workload, can answer simple paths, single-value paths
(//A/B='3') and multi-value paths (//A='2'/B='3').
considers data values, can be easily updated
![Page 15: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/15.jpg)
XPathLearner
![Page 16: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/16.jpg)
Architecture
![Page 17: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/17.jpg)
A More Detailed Example
![Page 18: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/18.jpg)
What to Store? Markov table (1st order in the
discussion)
![Page 19: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/19.jpg)
What to Store? may be large if there are many
data values solution: only "tag-tag", "tag", and
top-k value entries are stored exactly, other entries are stored within buckets
default is 1
![Page 20: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/20.jpg)
What is Actually Stored? Compressed 1st order Markov
table (or, Markov histogram)A 1B 6C 7D 7
A B 6A C 3B C 4B D 1C D 6
D v3 3
tag feat sum #pairsB a 1 1B b 1 1D a 2 2D b 2 2C a 1 1C b 1 1
assumption: v1-v4 starts with 'a', v5-v8 starts with 'b',k = 1
![Page 21: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/21.jpg)
Use this formula
: selectivity t1, t2, ..., tn: tags t1t2...tn: path with these tags N: total number of data items
How to Retrieve Selectivity?
![Page 22: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/22.jpg)
Use this formula (it's what we calculate)
: selectivity t1, t2, ..., tn: tags t1t2...tn: path with these tags f(p): frequency of the path p
How to Retrieve Selectivity?
![Page 23: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/23.jpg)
Use this formula (if it's multi-valued)
: selectivity t1, t2, ..., tn: tags t1t2...tn: path with these tags f(t,v): frequency of the value v in tag t
How to Retrieve Selectivity?
![Page 24: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/24.jpg)
Retrieval Example for path //B/C/D, estimated selectivity
=
for path //B/C/D=v3, estimated selectivity
=
=
![Page 25: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/25.jpg)
How to Update? get the query feedback, e.g., (BCD, 5) update the histogram entries that
contained in the query so that the future estimation could be more accurate
e.g., update B, C, D, BC, BD so that the estimation is nearer to 5 than before.
two update approaches: the Heavy-tail Rule, the Delta Rule
![Page 26: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/26.jpg)
Heavy Tail Rule put more correction towards the end
(tail) of the path equation:
fk() refers to the frequency before update fk+1() refers to the frequency after update suggestion: wi = 2i
![Page 27: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/27.jpg)
Heavy Tail Rule updating those one-'tag' entries
safeguards the terms that were set by exact query feedback
![Page 28: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/28.jpg)
Heavy Tail Rule A reminder to what is stored
A 1B 6C 7D 7
A B 6A C 3B C 4B D 1C D 6
D v3 3
tag feat sum #pairsB a 1 1B b 1 1D a 2 2D b 2 2C a 1 1C b 1 1
![Page 29: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/29.jpg)
Heavy Tail Rule Example: query feedback = (ACD,
6) by the table, estimation
= f(AC) / f(C) x f(CD) = 3 / 7 x 6 3
![Page 30: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/30.jpg)
Heavy Tail Rule updates:
new estimation = 4 / 8 x 8 = 4
![Page 31: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/31.jpg)
Delta Rule first proposed by Rumelhart et al.
basic idea:
where
![Page 32: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/32.jpg)
Experiments
![Page 33: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/33.jpg)
Experiments Data Set: DBLP (other experiments
are done but not included in the paper)
Metric: average absolute error, average relative error
![Page 34: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/34.jpg)
Experiments
![Page 35: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/35.jpg)
Experiments
![Page 36: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/36.jpg)
Experiments
![Page 37: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/37.jpg)
Experiments
![Page 38: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/38.jpg)
Experiments
![Page 39: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/39.jpg)
Conclusions XPathLearner is a new method for
estimating the selectivity of path expressions
It is online, based on query feedback and doesn't need database scan
use Markov histograms to store statistics
![Page 40: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/40.jpg)
Future Work change from fixed length Markov
table to variable length Markov table choose the paths to be stored more
carefully or wisely apply the update method to other
areas, e.g., graph based structures, to answer branching queries, etc
![Page 41: XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey](https://reader035.vdocuments.us/reader035/viewer/2022070409/56649e8f5503460f94b937a1/html5/thumbnails/41.jpg)
References[1]Lipyeow Lim, Min Wang, Sriram Padmanabhan,
Jeffrey Scott Vitter, Ronald Parr, XPathLearner: An On-Line Self-Tuning Markov Histogram for XML Path Selectivity Estimation, VLDB'02