management of xml and semistructured data

Post on 01-Jan-2016

19 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Management of XML and Semistructured Data. Lecture 7: XML-QL, Structural Recursion Monday, April 23, 2001. XML-QL. First declarative language for XML How to obtain a query language for XML fast ? Assume OEM as data model Use features from UnQL and StruQL Patterns Templates - PowerPoint PPT Presentation

TRANSCRIPT

Management of XML and Semistructured Data

Lecture 7: XML-QL, Structural Recursion

Monday, April 23, 2001

XML-QL

• First declarative language for XML• How to obtain a query language for XML fast ?

– Assume OEM as data model

– Use features from UnQL and StruQL• Patterns

• Templates

• Skolem functions

– Design XML-like syntax

Patterns in XML-QL

WHERE <book language=“french”> <publisher> <name> Morgan Kaufmann </> </> <author> $A </> </book> in “www.a.b.c/bib.xml”CONSTRUCT <author> $A </>

WHERE <book language=“french”> <publisher> <name> Morgan Kaufmann </> </> <author> $A </> </book> in “www.a.b.c/bib.xml”CONSTRUCT <author> $A </>

Find all authors who published in Morgan Kaufmann:

Abbreviation: </> closes any tag.

Patterns in XML-QL

where <book language=$X> <author> $A </author>

</book> in “www.a.b.c/bib.xml” <book>

<author> $A </author> <author> Jones </author>

</book> in “www.a.b.c/bib.xml”construct <result> $X </>

where <book language=$X> <author> $A </author>

</book> in “www.a.b.c/bib.xml” <book>

<author> $A </author> <author> Jones </author>

</book> in “www.a.b.c/bib.xml”construct <result> $X </>

Find all languages in which Jones’ coauthors have published:

There is a join here…

Constructors in XML-QL

where <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml”construct <result> <author> $A </> <lang> $L </> </>

where <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml”construct <result> <author> $A </> <lang> $L </> </>

Result is:<result> <author>Smith</author> <lang>English </lang> </result><result> <author>Smith</author> <lang>Mandarin</lang> </result><result> <author>Doe </author> <lang>English </lang> </result>. . . .

Find all authors and the languages in which they published:

Nested Queries in XML-QL

WHERE <book.author> $A </> in “www.a.b.c/bib.xml”CONSTRUCT <result> <author> $A </> WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml” CONSTRUCT <lang> $L </> </>

WHERE <book.author> $A </> in “www.a.b.c/bib.xml”CONSTRUCT <result> <author> $A </> WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml” CONSTRUCT <lang> $L </> </>

Find all authors and the languages in which they published; group by authors:

Note: book.author is a (regular) path expression

<result> <author>Smith</author> <lang>English</lang> <lang>Mandarin</lang> <lang>…</lang> …</result><result> <author>Doe</author> <lang>English</lang> …</result>

Result is:

Skolem Functions in XML-QL

WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml”CONSTRUCT <result id=F($A)> <author> $A</> <lang> $L </> </>

WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml”CONSTRUCT <result id=F($A)> <author> $A</> <lang> $L </> </>

Same query, with Skolem functions

Assumptions:• the ID attribute is always id• default Skolem function for author is G($A), for lang is H($A, $L) (why ?)

Skolem Functions in XML-QL

{ WHERE <book> <author> $A </> <title> $T </> </> in “www.a.b.c/bib.xml” CONSTRUCT <person id=F($A)> <name id=G($A)> $A </> <booktitle> $T</> /* implicit Skolem function H($A, $T) */ </>}{ WHERE <paper> <author> $A </> <title> $T </> <journal> $J </> </> in “www.d.e.f/papers.xml” CONSTRUCT <person id=F($A)> <name id=G($A)> $A </> <papertitle> $T</> /* implicit Skolem function J($A, $T) */ <journaltitle> $J</> /* implicit Skolem function K($A, $T) */ </>}

{ WHERE <book> <author> $A </> <title> $T </> </> in “www.a.b.c/bib.xml” CONSTRUCT <person id=F($A)> <name id=G($A)> $A </> <booktitle> $T</> /* implicit Skolem function H($A, $T) */ </>}{ WHERE <paper> <author> $A </> <title> $T </> <journal> $J </> </> in “www.d.e.f/papers.xml” CONSTRUCT <person id=F($A)> <name id=G($A)> $A </> <papertitle> $T</> /* implicit Skolem function J($A, $T) */ <journaltitle> $J</> /* implicit Skolem function K($A, $T) */ </>}

Object fusion with Skolem functions and block structure- Compile a complete list of authors, from two sources

Result:

(some have only books,Others only papers,Others have both)

<person> <name>Smith</name> <booktitle>Book1</booktitle > <booktitle>Book2</booktitle > </result><person> <name>Jones</name> <booktitle>Book3</booktitle > <papertitle>paper1</papertitle > <journaltitle>journal1</journaltitle > </result><person> <name>Mark</name> <papertitle>paper2</papertitle > <journaltitle>journal3</journaltitle > </result>…

Skolem Functions in XML-QL

WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml”CONSTRUCT <result id=F($A)> <author id=G($A)> $A</> <lang id=H($A)> $L </> </>

WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml”CONSTRUCT <result id=F($A)> <author id=G($A)> $A</> <lang id=H($A)> $L </> </>

“Wrong” query number 1:

What is “wrong” here ?

Skolem Functions in XML-QL

WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml”CONSTRUCT <result id=F($A,$L)> <author id=G($A)> $A</> <lang id=H($A,$L)> $L </> </>

WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml”CONSTRUCT <result id=F($A,$L)> <author id=G($A)> $A</> <lang id=H($A,$L)> $L </> </>

“Wrong” query number 2:

What is “wrong” here ?

Skolem Functions in XML-QL

{ WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml” CONSTRUCT <author id=F($A)> <lang id=H($A,$L)> $L </> </> }{ WHERE <person> <city> $C </> <fluent-in> $X </> </> in “www.a.b.c/bib.xml” CONSTRUCT <location id=G($C)> <lang id=H($C,$L)> $L </> </> }

{ WHERE <book language = $L> <author> $A </> </> in “www.a.b.c/bib.xml” CONSTRUCT <author id=F($A)> <lang id=H($A,$L)> $L </> </> }{ WHERE <person> <city> $C </> <fluent-in> $X </> </> in “www.a.b.c/bib.xml” CONSTRUCT <location id=G($C)> <lang id=H($C,$L)> $L </> </> }

“Wrong” query number 3:

Three Rules to Construct Only Trees

Rule 1: nested elements must have Skolem functions that are… [how ??]

Rule 2: an element that has an atomic content must have a Skolem function that is… [how ??]

Rule 3: if a Skolem function occurs in two different places than the following condition must hold… [which ??]

CONSTRUCT <tag1 id=F([args1])> <tag2 id=G([args2])> …</> </>

CONSTRUCT <tag1 id=F([args1])> <tag2 id=G([args2])> …</> </>

CONSTRUCT … <tag id=F([args])> $X </> …CONSTRUCT … <tag id=F([args])> $X </> …

{ CONSTRUCT <tag1 id=G([args1])> <tag id=F([args])> …</> </> }{ CONSTRUCT <tag1 id=H([args2])> <tag id=F([args])> …</> </> }

{ CONSTRUCT <tag1 id=G([args1])> <tag id=F([args])> …</> </> }{ CONSTRUCT <tag1 id=H([args2])> <tag id=F([args])> …</> </> }

XML-QL v.s. XQuery

• Xquery (=Quilt) v.s. XML-QL+ faithful XML data model

+ Xpath sublanguage

+ aggregate functions (like in SQL)

+ some features from XQL- Patterns- Skolem functions

A Different Paradigm:Structural Recursion

Data as sets with a union operator:

{a:3, a:{b:”one”, c:5}, b:4} =

{a:3} U {a:{b:”one”,c:5}} U {b:4}

Structural Recursion

f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = f($T)f({}) = {}f($V) = if isInt($V) then {result: $V} else {}

f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = f($T)f({}) = {}f($V) = if isInt($V) then {result: $V} else {}

Example: retrieve all integers in the data

a a b

b c3

“one” 5

4result result result

3 5 4

Structural Recursion

What does this do ?

f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L=a then {b:f($T)} else {$L:f($T)}f({}) = {}f($V) = $V

f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L=a then {b:f($T)} else {$L:f($T)}f({}) = {}f($V) = $V

Structural Recursion

What does this do ?

f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = {$L:{$L:f($T)}}f({}) = {}f($V) = $V

f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = {$L:{$L:f($T)}}f({}) = {}f($V) = $V

Input = tree with n nodesOutput = ???

Structural Recursion

Example: increase all engine prices by 10%

f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L= engine then {$L: g($T)} else {$L: f($T)}f({}) = {}f($V) = $V

f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L= engine then {$L: g($T)} else {$L: f($T)}f({}) = {}f($V) = $V

g($T1 U $T2) = g($T1) U g($T2)g({$L: $T}) = if $L= price then {$L:1.1*$T} else {$L: g($T)}g({}) = {}g($V) = $V

g($T1 U $T2) = g($T1) U g($T2)g({$L: $T}) = if $L= price then {$L:1.1*$T} else {$L: g($T)}g({}) = {}g($V) = $V

engine body

part price

price price

part price

100

1000

100

1000

engine body

part price

price price

part price

110

1100

100

1000

Structural Recursion

Retrieve all subtrees reachable by (a.b)*.aa

b

a

f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L= a then g($T} U $T else { }f({}) = { }f($V) = { }

f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L= a then g($T} U $T else { }f({}) = { }f($V) = { }

g($T1 U $T2) = g($T1) U g($T2)g({$L: $T}) = if $L= b then f($T) else { }g({}) = { }g($V) = { }

g($T1 U $T2) = g($T1) U g($T2)g({$L: $T}) = if $L= b then f($T) else { }g({}) = { }g($V) = { }

Structural Recursion: General Form

f1($T1 U $T2) = f1($T1) U f1($T2)f1({$L: $T}) = E1($L, f1($T),...,fk($T), $T)f1({}) = { }f1($V) = { }

f1($T1 U $T2) = f1($T1) U f1($T2)f1({$L: $T}) = E1($L, f1($T),...,fk($T), $T)f1({}) = { }f1($V) = { }

fk($T1 U $T2) = fk($T1) U fk($T2)fk({$L: $T}) = Ek($L, f1($T),...,fk($T), $T)fk({}) = { }fk($V) = { }

fk($T1 U $T2) = fk($T1) U fk($T2)fk({$L: $T}) = Ek($L, f1($T),...,fk($T), $T)fk({}) = { }fk($V) = { }

. . . .

Each of E1, ..., Ek consists only of {_ : _}, U, if_then_else_

Evaluating Structural Recursion

Recursive Evaluation:

• Compute the functions recursively, starting with f1 at the root

Termination is guaranteed.

How efficiently can we evaluate this ?

Structural Recursion

Consider this:

f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = {$L:f($T)}, $L:f($T)}f({}) = {}f($V) = $V

f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = {$L:f($T)}, $L:f($T)}f({}) = {}f($V) = $V

Naive Recursive Evaluation

a

b

c

d

a a

b b b b

c c c c c c c c

Input tree = n nodesOutput tree = 2n+1 – 1 nodes

Efficient Recursive EvaluationRecursive Evaluation with function memorization.PTIME complexity.

a

b

c

d

a

b

c

d

a

b

c

d

f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = {$L:f($T)}, $L:f($T)}f({}) = {}f($V) = $V

f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = {$L:f($T)}, $L:f($T)}f({}) = {}f($V) = $V

Alternatively: apply the function in parallel to each input edge Bulk Evaluation

Bulk EvaluationSometimes f doesn’t return anything use edges

a

b

c

d

f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L=c then $T else f($T)f({}) = {}f($V) = $V

f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L=c then $T else f($T)f({}) = {}f($V) = $V

d

d

Epsilon Edges

Meaning of edges:

=

a b

c d

a b

c d

dc

Epsilon Edges

Note: union becomes easy to draw with edges:

Example:a b c d

eU =a b c d e

=

a b c d e

T1 T2U =T1 T2

Bulk Evaluation

Idea: “apply” E1, ..., Ek independently on each edge, then connect with edges PTIME

f1($T1 U $T2) = f1($T1) U f1($T2)f1({$L: $T}) = E1($L, f1($T),...,fk($T), $T)f1({}) = { }f1($V) = { }

f1($T1 U $T2) = f1($T1) U f1($T2)f1({$L: $T}) = E1($L, f1($T),...,fk($T), $T)f1({}) = { }f1($V) = { }

fk($T1 U $T2) = fk($T1) U fk($T2)fk({$L: $T}) = Ek($L, f1($T),...,fk($T), $T)fk({}) = { }fk($V) = { }

fk($T1 U $T2) = fk($T1) U fk($T2)fk({$L: $T}) = Ek($L, f1($T),...,fk($T), $T)fk({}) = { }fk($V) = { }

. . . .

Bulk Evaluation

f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L= a then g($T} U $T else { }f({}) = { }f($V) = { }

f($T1 U $T2) = f($T1) U f($T2)f({$L: $T}) = if $L= a then g($T} U $T else { }f({}) = { }f($V) = { }

g($T1 U $T2) = g($T1) U g($T2)g({$L: $T}) = if $L= b then f($T) else { }g({}) = { }g($V) = { }

g($T1 U $T2) = g($T1) U g($T2)g({$L: $T}) = if $L= b then f($T) else { }g({}) = { }g($V) = { }

Recall (a.b)*.a:

a

b

a

a

a

b

b

a

c

ba

aa

b

b

a

c b

b

a

c

a

d

d

d

d

bc

Structural Recursion

• Can evaluate in two ways:– Recursively: memorize functions’ results– Bulk: apply all functions on all edges, in

parallel, connect, eliminate what is useless

• Complexity: PTIME– More precisely: NLOGSPACE

• Works on graphs with cycles too !

XSL

• two W3C drafts: XSLT and XPATH– http://www.w3.org/TR/xpath, 11/99– http://www.w3.org/TR/WD-xslt, 11/99

• in commercial products (e.g. IE5.0)

• purpose: stylesheet specification language:– stylesheet: XML -> HTML– in general: XML -> XML

XSL Templates and Rules

• query = collection of template rules

• template rule = match pattern + template

<xsl:template> <xsl:apply-templates/> </xsl:template>

<xsl:template match = “/bib/*/title”> <result> <xsl:value-of/> </result></xsl:template>

<xsl:template> <xsl:apply-templates/> </xsl:template>

<xsl:template match = “/bib/*/title”> <result> <xsl:value-of/> </result></xsl:template>

Retrieve all book titles:

Flow Control in XSL

<xsl:template> <xsl:apply-templates/> </xsl:template>

<xsl:template match=“a”> <A><xsl:apply-templates/></A></xsl:template>

<xsl:template match=“b”> <B><xsl:apply-templates/></B></xsl:template>

<xsl:template match=“c”> <C><xsl:value-of/></C></xsl:template>

<xsl:template> <xsl:apply-templates/> </xsl:template>

<xsl:template match=“a”> <A><xsl:apply-templates/></A></xsl:template>

<xsl:template match=“b”> <B><xsl:apply-templates/></B></xsl:template>

<xsl:template match=“c”> <C><xsl:value-of/></C></xsl:template>

<a> <e> <b> <c> 1 </c>

<c> 2 </c>

</b>

<a> <c> 3 </c>

</a>

</e>

<c> 4 </c>

</a>

<A> <B> <C> 1 </C>

<C> 2 </C>

</B>

<A> <C> 3 </C>

</A>

<C> 4 </C>

</A>

XSL is Structural Recursion

Equivalent to:

f(T1 U T2) = f(T1) U f(T2)f({L: T}) = if L= c then {C: t} else L= b then {B: f(t)} else L= a then {A: f(t)} else f(t)f({}) = {}f(V) = V

f(T1 U T2) = f(T1) U f(T2)f({L: T}) = if L= c then {C: t} else L= b then {B: f(t)} else L= a then {A: f(t)} else f(t)f({}) = {}f(V) = V

XSL query = single functionXSL query with modes = multiple function

XSL and Structural Recursion

XSL:• trees only• may loop

Structural Recursion:• arbitrary graphs• always terminates

<xsl:template match = “e”> <xsl:apply-patterns select=“/”/></xsl:template>

<xsl:template match = “e”> <xsl:apply-patterns select=“/”/></xsl:template>

stack overflow on IE 5.0

add the following rule:

top related