vist: a dynamic index method for querying xml data by tree structures authors: haixun wang, sanghyun...
TRANSCRIPT
![Page 1: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/1.jpg)
ViST: a dynamic index method for querying XML data by tree structures
Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu
Presenter: Elena Zheleva, November 2003
![Page 2: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/2.jpg)
Overview
Modeling XML Queries Structure-encoded sequences Indexing ViST Experimental Results
![Page 3: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/3.jpg)
Modeling XML Queries
![Page 4: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/4.jpg)
DTD of purchase records:(!ELEMENT purchases (purchase*))(!ELEMENT purchase (seller, buyer))(!ATTRIST seller ID ID location CDATA name CDATA)(!ELEMENT seller (item*))(!ATTRIST buyer ID ID location CDATA name CDATA)(!ELEMENT item (item*))(!ATTRIST item name CDATA manufacturer CDATA)
![Page 5: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/5.jpg)
Modeling XML Queries
Focus in XML query language design: ability to express complex structural or graphical queries
![Page 6: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/6.jpg)
Modeling XML Queries
Querying XML data = finding sub structures of the data graph that match the sequence
Structure-encoded sequences: a sequential representation of both XML data and XML queries
![Page 7: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/7.jpg)
Structure-Encoded Sequences
![Page 8: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/8.jpg)
Structure-Encoded Sequences
Maps the data and the queries Matches the subsequence Purpose: to avoid as many join
operations as possible Def. Sequence of (symbol, prefix) pairs
![Page 9: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/9.jpg)
Mapping Data
Represent XML document/tree in preorder
Represent in structure-encoded seq
![Page 10: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/10.jpg)
Mapping Queries
Benefit of sequence matching: query gets processed as whole
Path Expression
![Page 11: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/11.jpg)
Structure-Encoded Sequences
Query
Data
![Page 12: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/12.jpg)
Querying XML
through Structure-Encoded Sequence Matching
![Page 13: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/13.jpg)
Indexing
![Page 14: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/14.jpg)
Role of Indexing
To provide an algorithm to perform this sequence matching
Desired features for algorithm:– Efficient support for subsequence matching– Use well-supported DB indexing
techniques such as B+ trees– Allow dynamic index insertion
![Page 15: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/15.jpg)
What is indexing useful for
Auxiliary access structures– Used to speed up the retrieval of records– In response to certain search conditions
Provide efficient support for arbitrary structured queries– Using wild-cards // and *
![Page 16: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/16.jpg)
Indexing
State-of the-art approaches– Indexes on paths– Indexes on nodes– Indexes on both (structures) – ViST
![Page 17: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/17.jpg)
ViST
![Page 18: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/18.jpg)
Algorithms
Naïve Algorithm based on Suffix Trees RIST: Relationships Indexed Suffix Tree ViST: Virtual Suffix Tree
![Page 19: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/19.jpg)
Algorithm Using Suffix Trees
Suffix Tree: a compact index to all distinct, contiguous substrings of a string
D-Ancestorship – in XML doc tree Through structure-encoded sequence S-Ancestorship – in suffix tree
![Page 20: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/20.jpg)
Example Using Suffix Trees
![Page 21: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/21.jpg)
Algorithm Using Suffix Trees
Searches – first by S-Ancestorship: searching under
suffix tree– then by D-Ancestorship: matching nodes
and prefixes Disadvantages:
– Costly – traverse large portion of subtree– Most commercial DBMSs do not support
![Page 22: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/22.jpg)
RIST: Indexing by Ancestor-Descendant Relationships Jumps directly to the nodes Y to which
X is both a D-Ancestor and S-Ancestor Index Construction: uses B+ trees
![Page 23: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/23.jpg)
RIST: Indexing by Ancestor-Descendant Relationships Subsequence Matching Determine D-Ancestorship by prefixes Determine S-Ancestorship by label
<nx,sizex> x – suffix tree node (root of S-tree) nx – prefix traversal order sizex – number of descendants
![Page 24: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/24.jpg)
ViST: the Virtual Suffix Tree
Same sequence algorithm as RIST BUT supports dynamic insertions Uses dynamic method to assign labels Once assigned, the labels are fixed and are
not affected by subsequent data insertion or deletion
Labeling the suffix tree w/o building it Relies on statistical information about the
XML data
![Page 25: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/25.jpg)
ViST: the Virtual Suffix Tree
Index structure contains the sequence:
Sequence to be inserted:
Dynamic scope of x = <nx, sizex,kx>
![Page 26: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/26.jpg)
ViST: the Virtual Suffix Tree
![Page 27: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/27.jpg)
Experimental Results
Datasets used– DBLP: CS bibliography DB
• 289,627 records/publications• Each publication – tree of max depth 6• Avg length of structure-encoded seq = 31
– XMARK • 1 record• Complicated tree structure
– Synthetic
![Page 28: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/28.jpg)
Experimental Results
Comparison Methods – Index Fabric Algorithm – XML paths– XISS – uses nodes as basic query unit– ViST – appx. 1/10 of time to perform
queries due to (multiple) join operations
![Page 29: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/29.jpg)
Experimental Results - remove
Index Structure and Size (1/3 less from suffix tree)– DocId B+ Tree – N elements – Combined D-ancestor and S-ancestor B+
tree - N x L elements Index Construction
![Page 30: ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649d135503460f949e6e17/html5/thumbnails/30.jpg)
Conclusion
XML Queries = Subsequence Matching Advantages of ViST – algorithm for
subsequence matching– Avoids expensive join operations– Index on both content and structure of XML
documents– B+ trees – supported by disk-based data– Dynamic data insertion and deletion