formal machines for streaming xml querying
TRANSCRIPT
![Page 1: Formal machines for Streaming XML Querying](https://reader034.vdocuments.us/reader034/viewer/2022042818/55ab109a1a28ab2f698b45e4/html5/thumbnails/1.jpg)
Streaming XML
Kevin Tankersley
Machines and Algorithms for Real- Time XML Processing
![Page 2: Formal machines for Streaming XML Querying](https://reader034.vdocuments.us/reader034/viewer/2022042818/55ab109a1a28ab2f698b45e4/html5/thumbnails/2.jpg)
Overview
• XML Filtering Networks
– Overview of XML Processing Tasks
– Streaming XML and XML Data Networks
– XPath Expressions and Regular Expressions
– Node-based NFA Machines for XML Filtering
• Other Formal Models for XML Processing
– Specialized pushdown automata
– Specialized context-free grammars
![Page 3: Formal machines for Streaming XML Querying](https://reader034.vdocuments.us/reader034/viewer/2022042818/55ab109a1a28ab2f698b45e4/html5/thumbnails/3.jpg)
XML Data
• W3C Standard inspired by HTML– http://www.w3.org/XML/
• Currently used for:– Defining Data
• http://www.w3.org/XML/Schema– Integrating Systems
• http://www.w3.org/TR/soap/• http://www.w3.org/TR/wsdl
– Formatting Data• http://www.w3.org/Style/XSL/• http://www.w3.org/TR/xsl/
– Querying Data• http://www.w3.org/TR/xpath• http://www.w3.org/XML/Query/
![Page 4: Formal machines for Streaming XML Querying](https://reader034.vdocuments.us/reader034/viewer/2022042818/55ab109a1a28ab2f698b45e4/html5/thumbnails/4.jpg)
DOM Processing
![Page 5: Formal machines for Streaming XML Querying](https://reader034.vdocuments.us/reader034/viewer/2022042818/55ab109a1a28ab2f698b45e4/html5/thumbnails/5.jpg)
Streaming XML Processing
• Reduce memory requirements by performing XML processing tasks as XML data passes through application
• Example Tasks:– Validate XML
• Ensure XML Data is compliant and well-formed, and that is compliant with DTD/XSD
– Query XML• Extract/Filter subsets of the XML data for further
processing as it passes through application
• Frameworks:– JSR173: Streaming API for XML (StAX)
• javax.xml.stream– .NET XML Streams
![Page 6: Formal machines for Streaming XML Querying](https://reader034.vdocuments.us/reader034/viewer/2022042818/55ab109a1a28ab2f698b45e4/html5/thumbnails/6.jpg)
Application: XML Data Network
![Page 7: Formal machines for Streaming XML Querying](https://reader034.vdocuments.us/reader034/viewer/2022042818/55ab109a1a28ab2f698b45e4/html5/thumbnails/7.jpg)
XML Path Language
• Xpath Query:– Location Steps
• Axis• Node test• Predicate
• Axes– Child (default)– Descendent (//)– Attribute (@)
![Page 8: Formal machines for Streaming XML Querying](https://reader034.vdocuments.us/reader034/viewer/2022042818/55ab109a1a28ab2f698b45e4/html5/thumbnails/8.jpg)
XPath and Regular Expressions
• Consider XPath queries using child and descendent axes, name and * node tests, and no predicates:
• Such queries can be converted to regular expressions:– [university] N* [department]– N* [departments] N [courses]
• Input alphabet consists of nodes N
![Page 9: Formal machines for Streaming XML Querying](https://reader034.vdocuments.us/reader034/viewer/2022042818/55ab109a1a28ab2f698b45e4/html5/thumbnails/9.jpg)
Designing a Filtering Machine
1. Convert each XPath Query to an NFA
3. Combine into a single NFA– Take advantage of path sharing [Diao et al.,
2003]
5. Convert NFA to a DFA– Constrain to avoid state explosion– Lazy construction [Onizuka, 2003]
6. Add indexes– Stream index [Green et al, 2004]
![Page 10: Formal machines for Streaming XML Querying](https://reader034.vdocuments.us/reader034/viewer/2022042818/55ab109a1a28ab2f698b45e4/html5/thumbnails/10.jpg)
Example
1. /a/b
2. /a/c
3. /a/b/c
4. /a//b/c
5. /a/*/c
6. /a//c
7. /a/*/*/c
![Page 11: Formal machines for Streaming XML Querying](https://reader034.vdocuments.us/reader034/viewer/2022042818/55ab109a1a28ab2f698b45e4/html5/thumbnails/11.jpg)
System Architecture
![Page 12: Formal machines for Streaming XML Querying](https://reader034.vdocuments.us/reader034/viewer/2022042818/55ab109a1a28ab2f698b45e4/html5/thumbnails/12.jpg)
XML as a Context-Free Language
• XML (unlike HTML) must be properly nested– <a><b></b></a> : Valid– <a><b></a></b> : Invalid
• This structure affords the possibility of refining grammars and pushdown automata
• Visibly Pushdown Automata– Refinement of PDAs to enforce proper nesting of
begin and end tags. Originally constructed to analyze call and return sequences in programming languages
• Specialized Document Type Definition– Refinement of context-free grammars to enforce
proper nesting of begin and end tags
![Page 13: Formal machines for Streaming XML Querying](https://reader034.vdocuments.us/reader034/viewer/2022042818/55ab109a1a28ab2f698b45e4/html5/thumbnails/13.jpg)
Visibly Pushdown Automata
![Page 14: Formal machines for Streaming XML Querying](https://reader034.vdocuments.us/reader034/viewer/2022042818/55ab109a1a28ab2f698b45e4/html5/thumbnails/14.jpg)
VPDA Example
![Page 15: Formal machines for Streaming XML Querying](https://reader034.vdocuments.us/reader034/viewer/2022042818/55ab109a1a28ab2f698b45e4/html5/thumbnails/15.jpg)
Specialized DTDs
• Note that tags must properly wrap all expressions yielded by a production
• Note that an SDTD could be converted to a context-free grammar by replacing specializations with nonterminals and nesting production rules
![Page 16: Formal machines for Streaming XML Querying](https://reader034.vdocuments.us/reader034/viewer/2022042818/55ab109a1a28ab2f698b45e4/html5/thumbnails/16.jpg)
SDTDs and VPDAs• Every VPDA can be converted to an
equivalent PDA
• Every SDTD can be converted into an equivalent context-free grammar
• VPDAs and SDTDs are equivalent in the same way that CFGs and PDAs are
• XML Applications:• Automated machine rewriting for Data
Integration [Thomo et al., 2008]• Streaming type checking [Kumar et al.,
2007]• Streaming querying [Kumar et al., 2007]
![Page 17: Formal machines for Streaming XML Querying](https://reader034.vdocuments.us/reader034/viewer/2022042818/55ab109a1a28ab2f698b45e4/html5/thumbnails/17.jpg)
References
![Page 18: Formal machines for Streaming XML Querying](https://reader034.vdocuments.us/reader034/viewer/2022042818/55ab109a1a28ab2f698b45e4/html5/thumbnails/18.jpg)
References
![Page 19: Formal machines for Streaming XML Querying](https://reader034.vdocuments.us/reader034/viewer/2022042818/55ab109a1a28ab2f698b45e4/html5/thumbnails/19.jpg)