a normal form for xml documents
DESCRIPTION
A Normal Form for XML Documents. Marcelo Arenas Leonid Libkin Department of Computer Science University of Toronto. Motivating Example. courses. course. course. info. @cno. student. @cno. student. @sno. name. student. “cs100”. “cs225”. “123”. “Fox”. @sno. name. grade. @sno. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/1.jpg)
A Normal Form for XML Documents
Marcelo Arenas Leonid Libkin
Department of Computer ScienceUniversity of Toronto
![Page 2: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/2.jpg)
2
Motivating Example
courses
coursecourse info
@cno @cnostudent student
@snoname gradegrade name@sno
student
name@sno. . .
“123” “123” “A+”“B+”
“cs100” “cs225”
“Fox”“Fox”
“123” “Fox”
![Page 3: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/3.jpg)
3
Our Goal
courses course*course @cno, student*student @sno, name, gradename Sgrade S
DTD: Integrity Constraints:
, info*
info @sno, name
two students with the same @sno value must have the same name.
@sno is the key of info.
![Page 4: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/4.jpg)
4
A “Non-relational” Example
DBLP
conf conf
title issueissue
article articlearticle
@yeartitle title @year
@year
“ICDT”
@year
author @yeartitleauthor“1999”
“1999”
“1999”“Dong” “2001”“Jarke”
“2001”
“. . .” “. . .” “. . .”
. . .
![Page 5: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/5.jpg)
5
Problems to Address
Functional dependencies for XML.
Normal form for XML documents (XNF). Generalizes BCNF.
Algorithm for normalizing XML documents. Implication problem for functional
dependencies.
![Page 6: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/6.jpg)
6
Framework: DTDs
We do not consider mixed content, IDs and IDREFs.
Paths(D): all paths in a DTD Dcourses.course courses.course.@cnocourses.course.student.namecourses.course.student.name.S
We distinguish three kinds of elements: attributes (@), strings (S) and element types.
![Page 7: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/7.jpg)
7
Framework: XML Trees
v1
v2
v3 v4
v5
v6 v7
v0
. . .
courses
coursecourse
@cno
“cs100”
@sno name grade @sno name grade
student student
“123” “456”
“Fox” “B+” “Smith” “A-”
S S S S
![Page 8: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/8.jpg)
8
Towards FDs for XML
We know how to handle FDs in relational DBs.
We need a relational representation of XML.
We do it by considering tree tuples: a tree tuple in a DTD D is a mapping
t : Paths(D) Vertices Strings {}
consistent with D: could be extracted from an XML tree conforming to D.
![Page 9: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/9.jpg)
9
Tree Tuples
v1
v2
v0
courses
course
@cno student
“cs100”
t(courses) = v0
t(courses.course) = v1
t(courses.course.@cno) = “cs100”t(courses.course.student) = v2
t(p) = , for the remaining paths
A tree tuple represents an XML tree:
![Page 10: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/10.jpg)
10
XML Tree: set of Tree Tuples
v1
v2
v3 v4
v5
v6 v7
v0
. . .
courses
coursecourse
@cno
“cs100”
@sno name grade @sno name grade
student student
“123” “456”
“Fox” “B+” “Smith” “A-”
S S S S
v1
v2
courses
course
@cno
“cs100”
student
v0
v3 v4
@sno name grade
“123”
“Fox” “B+”
S S
v5
v6 v7
@sno name grade
student
“456”
“Smith” “A-”
S S
. . .
course
![Page 11: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/11.jpg)
11
Functional Dependencies
Expressions of the form: X Y
defined over a DTD D, where X, Y are finitenon-empty subsets of Paths(D).
XML tree T can be tested for satisfaction of X Y
if:
X Y Paths(T) Paths(D)
T X Y if for every pair u, v of tree tuples in T:
u.X = v.X and u.X ≠ implies u.Y = v.Y
![Page 12: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/12.jpg)
12
FD: Examples
University DTD: courses course*course @cno, student*student @sno, name, grade
Two students with the same @sno value must have the same name:
courses.course.student.@sno courses.course.student.name.S
Every student can have at most one grade in every course:
{ courses.course, courses.course.student.@sno }
courses.course.student.grade.S
![Page 13: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/13.jpg)
13
Checking FD Satisfaction
v1
v2
v3 v4
v6
v7 v8
v0
courses
coursecourse
@cno
“cs100”@sno name grade @sno name grade
student
“123” “123”
“Fox” “B+” “Fox” “A+”
S S S S
v5
@cno
“cs225”
studentv1
v2
v3 v4
v0
courses
course
@cno
“cs100”@sno name grade
student
“123”
“Fox” “B+”
S S
v6
v7 v8
course
@sno name grade
“123”
“Fox” “A+”
S S
v5
@cno
“cs225”
student
{ courses.course, courses.course.student.@sno } courses.course.student.grade.S
![Page 14: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/14.jpg)
14
Checking FD Satisfaction
v1
v2
v3 v4
v5
v6 v7
v0
courses
course
@cno
“cs100”@sno name grade @sno name grade
student
“123” “123”
“Fox” “B+” “Fox” “A+”
S S S S
studentv1
v2
v3 v4
v0
courses
course
@cno
“cs100”@sno name grade
student
“123”
“Fox” “B+”
S S
v5
v6 v7
@sno name grade
“123”
“Fox” “A+”
S S
student
{ courses.course, courses.course.student.@sno } courses.course.student.grade.S
![Page 15: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/15.jpg)
15
Implication Problem for FD
Given a DTD D and a set of functional dependencies {}:
(D, ) if for any XML tree T conforming to D and satisfying , it is
the case that T
(D, )+ = { | (D, ) }
Functional dependency is trivial if it is implied by the DTD alone: (D, )
![Page 16: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/16.jpg)
16
XNF: XML Normal Form
XML specification: a DTD D and a set of functional dependencies .
A Relational DB is in BCNF if for every non-trivial functional dependency X Y in the specification, X is a key.
(D, ) is in XNF if:
For each non-trivial FD X p.@l or X p.S in (D, )+, X p is in (D, )+.
![Page 17: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/17.jpg)
17
Back to DBLP
DBLP is not in XNF:
DBLP.conf.issue DBLP.conf.issue.article.@year (D,)+
DBLP.conf.issue DBLP.conf.issue.article
(D,)+
Proposed solution is in XNF.
![Page 18: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/18.jpg)
18
Normalization Algorithm
The algorithm applies two transformations until
the schema is in XNF.
If there is an anomalous FD of the form:
DBLP.conf.issue DBLP.conf.issue.article.@year
then apply the “DBLP example rule”.
Otherwise: choose a minimal anomalous FD and apply the “University example rule”.
![Page 19: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/19.jpg)
19
Normalizing XML Documents
Theorem The decomposition algorithm terminates and outputs a specification in XNF.
It does not lose information:
Unnormalized NormalizedXML document XML Document
Q1, Q2 are XQuery core queries.
It works even if we cannot compute (D,)+. If we know (D,)+, the output is better.
Q1
Q2
![Page 20: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/20.jpg)
20
Implication Problem (cont’d)
Typically, regular expressions used in DTDs are rather simple.
Trivial regular expression: s1, s2, …, sn
Each si is either one of ai, ai?, ai+ or ai*
For i ≠ j, ai ≠ aj
D is a simple DTD if all its productions use “permutations” of trivial regular expressions.Example: (a | b)* is a permutation of a*, b*
![Page 21: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/21.jpg)
21
Implication Problem (cont’d)
Theorem For simple DTDs: The implication problem for FDs is solvable
in O(n2). Testing if a specification is in XNF can be
done in O(n3).
Other results: There is a larger class of DTDs for which
these problems are tractable. There is an even larger class of DTDs for
which they are coNP-complete.
![Page 22: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/22.jpg)
22
Final Comments
The normalization algorithm can be improved in various ways.
Why XNF? Theorem XNF generalizes BCNF and a normal form
for nested relations (called NNF) when those are coded as XML documents.
A complete classification of the complexity of the implication problem for FDs remains open.
Implementation.
![Page 23: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/23.jpg)
Backup Slides
![Page 24: A Normal Form for XML Documents](https://reader036.vdocuments.us/reader036/viewer/2022062410/56815810550346895dc57de7/html5/thumbnails/24.jpg)
24
Normalization Algorithm
We consider FDs of the form:
{q, p1.@l1, …, pn.@ln } p
where n 0 and q ends with an element type.
The algorithm applies two transformations until the schema is in XNF. If there is an anomalous FD with n = 0:
DBLP.conf.issue DBLP.conf.issue.article.@year
then apply the “DBLP example rule”.
Otherwise: choose a minimal anomalous FD and apply the “University example rule”.