lecture 10 xml
DESCRIPTION
Lecture 10 XML. Monday, Oct. 21, 2001. Outline. Finish Datalog (4.2-4.4) XML: Syntax, DTDs ( Data on the Web , 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1). Multiple Datalog Rules. Product ( pid , name, price, category, maker-cid) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/1.jpg)
Lecture 10XML
Monday, Oct. 21, 2001
![Page 2: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/2.jpg)
Outline
• Finish Datalog (4.2-4.4)• XML:
– Syntax, DTDs (Data on the Web, 3.1)– Semistructured data in XML (3.2)– Exporting Relational Data in XML (8.3.1)
![Page 3: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/3.jpg)
Multiple Datalog RulesProduct ( pid, name, price, category, maker-cid)Purchase (buyer-ssn, seller-ssn, store, pid)Company (cid, name, stock price, country)Person(ssn, name, phone number, city)
• Find names of buyers and sellers:
A(n) Person(s,n,_,_), Purchase(s,_,_,_)A(n) Person(s,n,_,_), Purchase(_,s,_,_)
• Multiple rules correspond to union
![Page 4: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/4.jpg)
Multiple Datalog RulesProduct ( pid, name, price, category, maker-cid)Purchase (buyer-ssn, seller-ssn, store, pid)Company (cid, name, stock price, country)Person(ssn, name, phone number, city)
• Find Seattle residents who bought products over $100:E(s) Product(i,_,p,_,_) AND Purchase(s,_,_,i) AND p>100A(n) Person(s,n,_,”Seattle”) AND E(s)
• Multiple rules correspond to sequential computation• Same as substituting E’s body in the second rule
![Page 5: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/5.jpg)
Negation in DatalogProduct ( pid, name, price, category, maker-cid)Purchase (buyer-ssn, seller-ssn, store, pid)Company (cid, name, stock price, country)Person(ssn, name, phone number, city)
• Find all “bad pid’s” in Purchase (I.e. which don’t occur in Product)
P(p) Product(p,_,_,_,_)BadP(p) Purchase(_,_,_,p) AND NOT P(p)
• Wrong solution why ?BadPWrong(p) Purchase(_,_,_,p) AND NOT Product(p,_,_,_)
![Page 6: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/6.jpg)
Negation in Datalog (continued)Product ( pid, name, price, category, maker-cid)Purchase (buyer-ssn, seller-ssn, store, pid)Company (cid, name, stock price, country)Person(ssn, name, phone number, city)
• Find products that were never sold:
Sold(p) Purchase(_,_,_,p) AND Product(p,_,_,_,_)NeverSold(p) Product(p,_,_,_) AND NOT Sold(p)
![Page 7: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/7.jpg)
Relational Algebra and Datalog
• Datalog:– Friendly– Says nothing about how to evaluate
• Relational Algebra– Unfriendly– Can say in which order to evaluate
• Good news: relational algebra is equivalent to (non-recursive) datalog !
![Page 8: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/8.jpg)
From Relational Algebra to Datalog
• Union R1 U R2:S(x,y,z) R1(x,y,z)S(x,y,z) R2(x,y,z)
• Difference R1 - R2S(x,y,z) R1(x,y,z) AND NOT R2(x,y,z)
• Cartesian product R1 x R2S(x,y,z,u,w) R1(x,y,z) AND R2(u,w)
![Page 9: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/9.jpg)
From RA to Datalog (cont’d)
• Selection z > 35(R)
S(x,y,z,u) R(x,y,z,u) AND z > 35
• Projection x,z (R)
S(x,z) R(x,y,z,u)
![Page 10: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/10.jpg)
From (non-recursive) Datalog to RA
• Let’s take an example: R(A,B,C), S(D,E,F,G), T(H,I)S(x,y) R(x,y,z) AND S(y,y,w,x) AND T(z,55)
• First make all variables distinct, add arithmetic atoms:S(x,y) R(x,y,z) AND S(y1,y2,w,x3) AND T(z4,c5) AND y=y1 AND y1=y2 AND x=x3 AND z=z4 AND c5=55
• In RA: a select-project-join expression:A, B ( B=D AND D=E AND A=G AND C=H AND I=55 (R x S x T))
![Page 11: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/11.jpg)
From (non-recursive) Datalog to RA
• Exercises:– Translate a rule with negation to RA (hint: use
difference)– Translated multiple rules to RA (hint: use union
and/or substitutions; remember that rules are non-recursive)
![Page 12: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/12.jpg)
Recursive Datalog Programs
• Recall:– Find Fred’s relatives
Relative(x) R(“Fred”,x,_)Relative(y) Relative(x) AND R(x,y,_)
Name1 Name2 Relationship
Fred Mary Father
Mary Joe Cousin
Mary Bill Spouse
Nancy Lou Sister
Recommended reading: 4.4
![Page 13: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/13.jpg)
XML
![Page 14: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/14.jpg)
Facts About XML
• 254 books at Amazon• 6,344,313 pages at www.altavista.com• Every database vendor has an XML page:
– www.oracle.com/xml– www.microsoft.com/xml– www.ibm.com/xml
• Many applications are just fancier Websites• But, most importantly, XML enables data sharing
on the Web – hence our interest
![Page 15: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/15.jpg)
What is XML ?From HTML to XML
HTML describes the presentation: easy for humans
![Page 16: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/16.jpg)
HTML
<h1> Bibliography </h1><p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995<p> <i> Data on the Web </i> Abiteboul, Buneman, Suciu <br> Morgan Kaufmann, 1999
HTML is hard for applications
![Page 17: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/17.jpg)
XML<bibliography>
<book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> …
</bibliography>
XML describes the content: easy for applications
![Page 18: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/18.jpg)
XML
• eXtensible Markup Language• Roots: comes from SGML
– A very nasty language• After the roots: a format for sharing data• Emerging format for data exchange on the
Web and between applications
![Page 19: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/19.jpg)
XML Applications
• Sharing data between different components of an application.
• Archive data in text files.• EDI: electronic data exchange:
– Transactions between banks– Producers and suppliers sharing product data (auctions)– Extranets: building relationships between companies
• Scientists sharing data about experiments.• Sending data by email -- see project
![Page 20: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/20.jpg)
XML Syntax
• Very simple:<db> <book> <title>Complete Guide to DB2</title> <author>Chamberlin</author> </book> <book> <title>Transaction Processing</title> <author>Bernstein</author> <author>Newcomer</author> </book> <publisher> <name>Morgan Kaufman</name> <state>CA</state> </publisher></db>
![Page 21: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/21.jpg)
XML Terminology• tags: book, title, author, …• start tag: <book>, end tag: </book>• start tags must correspond to end tags, and
conversely
![Page 22: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/22.jpg)
XML Terminology• an element: everything between tags
– example element: <title>Complete Guide to DB2</title>
– example element:
• elements may be nested• empty element: <red></red> abbreviated <red/>• an XML document has a unique root element
well formed XML document: if it has matching tags
<book> <title> Complete Guide to DB2 </title> <author>Chamberlin</author> </book>
![Page 23: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/23.jpg)
The XML Treedb
book book publisher
title author title author author name state“CompleteGuideto DB2”
“Chamberlin” “TransactionProcessing”
“Bernstein” “Newcomer”“MorganKaufman”
“CA”
Tags on nodesData values on leaves
![Page 24: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/24.jpg)
More XML Syntax: Attributes
<book price = “55” currency = “USD”> <title> Complete Guide to DB2 </title> <author> Chamberlin </author> <year> 1998 </year></book>
price, currency are called attributes
![Page 25: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/25.jpg)
Replacing Attributes with Elements
<book> <title> Complete Guide to DB2
</title> <author> Chamberlin </author> <year> 1998 </year> <price> 55 </price> <currency> USD </currency></book>
attributes are alternative ways to represent data
![Page 26: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/26.jpg)
“Types” (or “Schemas”) for XML
• Document Type Definition – DTD• Define a grammar for the XML document,
but we use it as substitute for types/schemas• Will be replaced by XML-Schema (will
extend DTDs)
![Page 27: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/27.jpg)
An Example DTD
• PCDATA means Parsed Character Data (a mouthful for string)
<!DOCTYPE db [ <!ELEMENT db ((book|publisher)*)> <!ELEMENT book (title,author*,year?)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT year (#PCDATA)> <!ELEMENT publisher (#PCDATA)>]>
![Page 28: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/28.jpg)
More on DTDs: Attributes<!DOCTYPE db [ <!ELEMENT db ((book|publisher)*)> <!ELEMENT book (title,author*,year?)> . . . <!ATTLIS book price CDATA #REQURED language CDATA #IMPLIED> <!ATTLIS author phone CDATA #IMPLIED> ]>
<db> <book price=“55” language=“English”> <title> Complete Guide to DB2 </title> <author> Chamberlin </author> </book>…</db>
The type:CDATA = stringID = a keyIDREF = a foreign keyothers=rarely used
Default declaration:#REQUIRED=required#IMPLIED=optional#FIXED=fixed (rarely used)
![Page 29: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/29.jpg)
DTDs as Grammars
Same thing as:
• A DTD is a EBNF (Extended BNF) grammar• An XML tree is precisely a derivation tree
XML Documents that have a DTD and conform to it are called valid
db ::= (book|publisher)*book ::= (title,author*,year?)title ::= stringauthor ::= stringyear ::= stringpublisher ::= string
![Page 30: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/30.jpg)
More on DTDs as Grammars<!DOCTYPE paper [ <!ELEMENT paper (section*)> <!ELEMENT section ((title,section*) | text)> <!ELEMENT title (#PCDATA)> <!ELEMENT text (#PCDATA)>]>
<paper> <section> <text> </text> </section> <section> <title> </title> <section> … </section> <section> … </section> </section></paper>
XML documents can be nested arbitrarily deep
![Page 31: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/31.jpg)
XML for Representing Data
<persons><row> <name>John</name> <phone> 3634</phone></row> <row> <name>Sue</name> <phone> 6343</phone> <row> <name>Dick</name> <phone> 6363</phone></row>
</persons>
n a m e p h o n e
J o h n 3 6 3 4
S u e 6 3 4 3
D i c k 6 3 6 3
row row row
name name namephone phone phone
“John” 3634 “Sue” “Dick”6343 6363
persons XML: persons
![Page 32: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/32.jpg)
XML vs Data Models
• XML is self-describing• Schema elements become part of the data
– Reational schema: persons(name,phone)– In XML <persons>, <name>, <phone> are part
of the data, and are repeated many times• Consequence: XML is much more flexible• XML = semistructured data
![Page 33: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/33.jpg)
Semi-structured Data Explained
• Missing attributes:
• Repeated attributes
<person> <name> John</name> <phone>1234</phone> </person>
<person> <name>Joe</name></person> no phone !
<person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone></person>
two phones !
![Page 34: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/34.jpg)
Semistructured Data Explained
• Attributes with different types in different objects
• Nested collections (no 1NF)• Heterogeneous collections:
– <db> contains both <book>s and <publisher>s
<person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone></person>
structured name !
![Page 35: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/35.jpg)
XML Data v.s. E/R, ODL, Relational
• Q: is XML better or worse ?• A: serves different purposes
– E/R, ODL, Relational models:• For centralized processing, when we control the data
– XML:• Data sharing between different systems• we do not have control over the entire data• E.g. on the Web
• Do NOT use XML to model your data ! Use E/R, ODL, or relational instead.
![Page 36: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/36.jpg)
Data Sharing with XML: Easy
Data source(e.g. relational
Database)
ApplicationWeb
XML
![Page 37: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/37.jpg)
Exporting Relational Data to XML
• Product(pid, name, weight)• Company(cid, name, address)• Makes(pid, cid, price)
product companymakes
![Page 38: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/38.jpg)
Export data grouped by companies
<db><company> <name> GizmoWorks </name> <address> Tacoma </address> <product> <name> gizmo </name> <price> 19.99 </price> </product> <product> …</product> …</company><company> <name> Bang </name> <address> Kirkland </address> <product> <name> gizmo </name> <price> 22.99 </price> </product> …</company>…
</db>
Redundantrepresentationof products
![Page 39: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/39.jpg)
The DTD
<!ELEMENT db (company*)><!ELEMENT company (name, address, product*)><!ELEMENT product (name,price)><!ELEMENT name (#PCDATA)><!ELEMENT address (#PCDATA)><!ELEMENT price (#PCDATA)>
![Page 40: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/40.jpg)
Export Data by Products<db> <product> <name> Gizmo </name> <manufacturer> <name> GizmoWorks </name> <price> 19.99 </price> <address> Tacoma </address> </manufacturer> <manufacturer> <name> Bang </name> <price> 22.99 </price> <address> Kirkland
</address> </manufacturer> … </product> <product> <name> OneClick </name> …</db>
RedundantRepresentationof companies
![Page 41: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/41.jpg)
Which One Do We Choose ?
• The structure of the XML data is determined by agreement, with our partners, or dictated by committees– Many XML dialects (called applications)
• XML Data is often nested, irregular, etc• No normal forms for XML
![Page 42: Lecture 10 XML](https://reader036.vdocuments.us/reader036/viewer/2022062816/568150fc550346895dbf1bbf/html5/thumbnails/42.jpg)
Storing XML Data
• We got lots of XML data from the Web, how do we store it ?
• Ideally: convert to relational data, store in RDBMS
• Much harder than exporting relations to XML (why ?)
• DB Vendors currently work on tools for loading XML data into an RDBMS