syntax-directed transformations of xml streams

Post on 22-Feb-2016

51 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Syntax-directed Transformations of XML Streams. Stefanie Scherzinger joint work with Alfons Kemper . XML Stream Processing. 1999 Data on the Web Serge Abiteboul Peter Buneman - PowerPoint PPT Presentation

TRANSCRIPT

1

Syntax-directed Transformationsof XML Streams

Stefanie Scherzinger joint work with Alfons Kemper

2

<bib> <book> <year>1999</year> <title>Data on the Web</title> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author>Dan Suciu</author> </book>...

<!ELEMENT bib (book)*><!ELEMENT book (year,title,author,author*)<!ELEMENT year #PCDATA><!ELEMENT title #PCDATA><!ELEMENT author #PCDATA>

1. Very long XML documents.

3. Schema information is available.

2. Applications need to becompletely main-memory based.

XML Stream Processing

3

XML Query Languages

//book[year=2003]/title

<books> { for $x in input()//book where $x/year=2003 return <book> {$x/title} <authors> {$x/author} </authors> </book> }</books>

XPath

XQuery

<?xml version="1.0" encoding="ISO-8859-1"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:template match="/"><books><xsl:for-each select="bib/book"><book> <xsl:copy-of select="title"/>

<xsl:copy-of select="author"/></book></xsl:for-each></books></xsl:template></xsl:stylesheet>

XSLT

Schema knowledgenecessary to specify query!

4

TransformX Attribute Grammars

1. (Suitable) extended regular tree grammar, e.g. DTD

2. Add attribution functions (Java code)

3. Parser generator produces Java code:• Validates the input• Evaluates the attribution functions

4. Compile and execute

5

Extended Regular Tree Grammars

Grammar G = (Nt,T,P,bib)Nonterminals Nt = {bib,pub,year,title,author}Terminals T = {bib,book,year,title,author,PCDATA}

bib ::= bib( pub* ) pub ::= book( year.title.author.author* )pub ::= article( year.title.author.author* ) year ::= year( PCDATA )title ::= title( PCDATA )author ::= author( PCDATA ) bib

book

year title author author author

L(G)

6

Example: Task<bib> <book> <year>1999</year> < title>Data on the Web</title> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author>Dan Suciu</author> </book>...

<books> <book> <id>1</id> <title>Data on the Web</title> <year>1999</year> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author>Dan Suciu</author> </book>...

1. Re-label root to “books”2. Retrieve all books, but not articles3. For each book, output

• numerical identifier• title, year, and authors

input: output:

7

Example: TransformX Attribute Grammar

8

Example: TransformX Attribute Grammar

definitionsection

rulessection

class-membersection

attributionfunctions

9

10

Grammar provides context information potential for optimization

11

Extended Regular Tree Grammars

Grammar G = (Nt,T,P,bib)Nonterminals Nt = {bib,pub,year,title,author}Terminals T = {bib,book,year,title,author,PCDATA}

bib ::= bib( pub* ) pub ::= book( year.title.author.author* )pub ::= article( year.title.author.author* ) year ::= year( PCDATA )title ::= title( PCDATA )author ::= author( PCDATA ) bib

book

year title author author author

L(G)Abbreviation: (pub*)=(book article)*

12

TDLL(1) Grammars

ERTG where rhs is or(regular expression) is one-unambiguous:

• a*.a • a.a* • a.b* a.c* • a.(b* c*)

deterministic parsing with one token lookahead

parse tree can be unambiguously constructed with lookahead of one token:

DTDs are a dialect of TDLL(1) grammars

bib

book

year title author author author

Lee, Mani, Murata, 2000.

13

Strong One-Unambiguity

stronglyone-unambiguous

Koch, Scherzinger, 2003.

14

Syntax in the AbstractAttributed TDLL(1) grammar, i.e., each production

1. is of one of the four forms:n :: = t()

n :: = {f$[} t()n :: = t() {f$]}

n :: = {f$[} t() {f$]}

2. if is an attributed regular expression, then for the regular expression without the attribution functions:

() must be strongly one-unambiguous

15

Example

16

Parse Tree

bib

book

year title author author author

17

Attributed Parse Tree

bib

book

year title author author author

18

Attributed Parse Tree

bib

book

yeartitle

authorauthor author

yeartitle

authorauthor author

19

Attributed Parse Tree

bib

book

yeartitle

authorauthor author

yeartitle

authorauthor author

20

bib

book

yeartitle

authorauthor author

yeartitle

authorauthor author

L-attributed Grammars

21

bib

book

yeartitle

authorauthor author

yeartitle

authorauthor author

22

bib

book

yeartitle

authorauthor author

yeartitle

authorauthor author

23

bib

book

yeartitle

authorauthor author

yeartitle

authorauthor author

24

bib

book

yeartitle

authorauthor author

yeartitle

authorauthor author

25

bib

book

yeartitle

authorauthor author

yeartitle

authorauthor author

26

27

In Practice

28

In Practice

29

accessible from withinattribution functions

Class Members

30

transfer informationbetween

attribution functions

TransformXAttributes

31

The TransformX Parser Generator

Translation to Java source code:

1. The validator module– validate input– output attribution functions as encountered

in attributed extended parse tree generated in O(|G|3)

2. The evaluator module– evaluate attribution functions– store attributes on stack generated in O(1)

32

Experiments

Prototype: C++ implementation,generates Java code

Experiments:1. Validate the input2. Output the input3. Evaluate example

Data: Books and articles, datasets 31-122 MB

Memory consumption: 12 MB

33

Conclusion & Summary

• TransformX attribute grammars specify many queries conveniently often more convenient than SAX grammar may reveal potential for optimization

• TransformX parser generatorlittle runtime-overhead (validation+attributes)

• Prototype implementation

34

Selected Related WorkXML and Attribute GrammarsM. Benedikt, C.Y. Chang, W. Fan, J. Freire,

and R. Rastogi. “Capturing both Types and Constraints in Data Integration“. SIGMOD’03.

M. Benedikt, C.Y. Chan, W. Fan, R. Rastogi, S. Zhen, and A. Zhou. “DTD-Directed Publishing with Attribute Translation Grammars“. VLDB’02.

C. Koch and S. Scherzinger:“Attribute Grammars for Scalable Query Processing on XML Streams“, DBPL’03.

F. Neven and J. van de Bussche. “Expressiveness of Structured Document Query Languages Based on Attribute Grammars“. JACM, Jan. 2002.

S. Nishimura and K. Nakano. “XML Stream Transformer Generation Through Program Composition and Dependency Analysis“. Science of Computer Programming, 2005.

One-unambiguous Regular LanguagesBrüggemann-Klein and D. Wood. “One-

Unambiguous Regular Languages“. Information and Computation, 1998.

Strong One-unambiguityC. Koch and S. Scherzinger:

“Attribute Grammars for Scalable Query Processing on XML Streams“, DBPL’03.

TDLL(1) GrammarsD. Lee, M. Mani, and M. Murata. “Reasoning

about XML Schema Languages using Formal Language Theory.“ Technical Report RJ 10197 Log 95071, IBM Research, Nov. 2000.

Lex&YaccJ. R. Levine, T. Mason, D. Brown. “lex&yacc“.

O‘Reilly, 1992.

35

Thank you

top related