semistructured data and xml cs 645 april 5, 2006 some slide content courtesy of ramakrishnan &...
TRANSCRIPT
![Page 1: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/1.jpg)
Semistructured data and XML
CS 645April 5, 2006
Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives.
![Page 2: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/2.jpg)
2
Today’s lecture
• Semistructured data– History and motivation
• XML: syntax and typing• Querying XML data
– XPath– XQuery
• Overview of research issues
2
![Page 3: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/3.jpg)
3
Structure in data representation
• Relational data is highly structured– structure is defined by the schema– good for system design– good for precise query semantics / answers
• Structure can be limiting– authoring is constrained: schema-first– changes to structure not easy– querying constrained: must know schema– data exchange hard: integration of diff schema
Some reasons why more data is not in databases
![Page 4: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/4.jpg)
WWW
Structured data - Databases
Unstructured Text - Documents
Semistructured Data
![Page 5: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/5.jpg)
7
Need for loose structure
• Evolving, unknown, or irregular structure
• Integration of structured, but heterogeneous data sources
• Textual data with tags and links
• Combination of data models
7
![Page 6: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/6.jpg)
XML is the confluence of many factors:
• The Web needed a more declarative format for data
• Documents needed a mechanism for extended tags
• Database people needed a more flexible interchange format
• It’s parsable even if we don’t know what it means!
Original expectation:
• The whole web would go to XML instead of HTML
Today’s reality:
• Not so… But XML is used all over “under the covers”
XML is the preeminent format for semi-structured data
![Page 7: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/7.jpg)
Why DB People Like XML
Can get data from all sorts of sources
• Allows us to touch data we don’t own!
• This was actually a huge change in the DB community
Blends schema and data into one format
• Unlike relational model, where we need schema first
• … But too little schema can be a drawback, too!
![Page 8: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/8.jpg)
XML: Syntax & Typing
![Page 9: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/9.jpg)
XML Syntax• tags: book, title, author, …
• start tag: <book>, end tag: </book>
• elements: <book>…<book>,<author>…</author>
• elements are nested
• empty element: <red></red> abbrv. <red/>
• an XML document: single root element
An XML document is well formed if it has matching tags
![Page 10: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/10.jpg)
XML Syntax
<book price = “55” currency = “USD”>
<title> Foundations of Databases </title>
<author> Abiteboul </author>
…
<year> 1995 </year>
</book>
<book price = “55” currency = “USD”>
<title> Foundations of Databases </title>
<author> Abiteboul </author>
…
<year> 1995 </year>
</book>
attributes are alternative ways to represent data
![Page 11: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/11.jpg)
XML Syntax
<person id=“o555”> <name> Jane </name> </person>
<person id=“o456”> <name> Mary </name>
<children idref=“o123 o555”/>
</person>
<person id=“o123” mother=“o456”><name>John</name>
</person>
<person id=“o555”> <name> Jane </name> </person>
<person id=“o456”> <name> Mary </name>
<children idref=“o123 o555”/>
</person>
<person id=“o123” mother=“o456”><name>John</name>
</person>
oids and references in XML are just syntax
![Page 12: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/12.jpg)
XML Semantics: a Tree !<data>
<person id=“o555” >
<name> Mary </name>
<address>
<street> Maple </street>
<no> 345 </no>
<city> Seattle </city>
</address>
</person>
<person>
<name> John </name>
<address> Thailand </address>
<phone> 23456 </phone>
</person>
</data>
<data>
<person id=“o555” >
<name> Mary </name>
<address>
<street> Maple </street>
<no> 345 </no>
<city> Seattle </city>
</address>
</person>
<person>
<name> John </name>
<address> Thailand </address>
<phone> 23456 </phone>
</person>
</data>
data
Mary
person
person
nameaddres
sname
address
street
nocity
Maple 345 Seattle
JohnThai
phone
23456
id
o555
Elementnode
Textnode
Attributenode
Order matters !!!
![Page 13: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/13.jpg)
XML Data
• XML is self-describing
• Schema elements become part of the data– Relational schema: persons(name,phone)– In XML <persons>, <name>, <phone> are part of
the data, and are repeated many times
• Consequence: XML is much more flexible
Some real data:
http://www.cs.washington.edu/research/xmldatasets/
![Page 14: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/14.jpg)
Relational Data as XML
<person><row> <name>John</name> <phone> 3634</phone></row> <row> <name>Sue</name> <phone> 6343</phone> <row> <name>Dick</name> <phone> 6363</phone></row>
</person>
<person><row> <name>John</name> <phone> 3634</phone></row> <row> <name>Sue</name> <phone> 6343</phone> <row> <name>Dick</name> <phone> 6363</phone></row>
</person>
row row row
name name namephone phone phone
“John” 3634 “Sue” “Dick”
6343 6363
person
XML: person
name phone
John 3634
Sue 6343
Dick 6363
![Page 15: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/15.jpg)
XML is Semi-structured Data
• Missing attributes:
• Could represent ina table with nulls
<person> <name> John</name> <phone>1234</phone> </person>
<person> <name>Joe</name></person>
<person> <name> John</name> <phone>1234</phone> </person>
<person> <name>Joe</name></person>
← no phone !
name phone
John 1234
Joe -
![Page 16: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/16.jpg)
XML is Semi-structured Data
• Repeated attributes
• Impossible in tables:
<person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone></person>
<person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone></person>
← two phones !
name phone
Mary 2345 3456 ???
![Page 17: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/17.jpg)
XML is Semi-structured Data
• Attributes with different types in different objects
• Nested collections (non 1NF)• Heterogeneous collections:
– <db> contains both <book>s and <publisher>s
<person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone></person>
<person> <name> <first> John </first> <last> Smith </last> </name> <phone>1234</phone></person>
← structured name !
![Page 18: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/18.jpg)
Data Typing in XML• Data typing in the relational model: schema
• Data typing in XML– Much more complex– Typing restricts valid trees that can occur
• theoretical foundation: tree languages
– Practical methods:• DTD (Document Type Descriptor)• XML Schema
![Page 19: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/19.jpg)
Document Type DefinitionsDTD
• Part of the original XML specification• To be replaced by XML Schema
– Much more complex
• An XML document may have a DTD• XML document:
well-formed = if tags are correctly closedValid = if it has a DTD and conforms to it
• Validation is useful in data exchange
![Page 20: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/20.jpg)
DTD Example
<!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office, phone?)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, description?)> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)>]>
<!DOCTYPE company [ <!ELEMENT company ((person|product)*)> <!ELEMENT person (ssn, name, office, phone?)> <!ELEMENT ssn (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT office (#PCDATA)> <!ELEMENT phone (#PCDATA)> <!ELEMENT product (pid, name, description?)> <!ELEMENT pid (#PCDATA)> <!ELEMENT description (#PCDATA)>]>
![Page 21: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/21.jpg)
DTD Example
<company> <person> <ssn> 123456789 </ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> </person> <person> <ssn> 987654321 </ssn> <name> Jim </name> <office> B123 </office> </person> <product> ... </product> ...</company>
<company> <person> <ssn> 123456789 </ssn> <name> John </name> <office> B432 </office> <phone> 1234 </phone> </person> <person> <ssn> 987654321 </ssn> <name> Jim </name> <office> B123 </office> </person> <product> ... </product> ...</company>
Example of valid XML document:
![Page 22: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/22.jpg)
DTD: The Content Model
• Content model:– Complex = a regular expression over other elements
– Text-only = #PCDATA
– Empty = EMPTY
– Any = ANY
– Mixed content = (#PCDATA | A | B | C)*
<!ELEMENT tag (CONTENT)><!ELEMENT tag (CONTENT)>
contentmodel
![Page 23: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/23.jpg)
DTD: Regular Expressions
<!ELEMENT name (firstName, lastName))
<!ELEMENT name (firstName, lastName))
<name> <firstName> . . . . . </firstName> <lastName> . . . . . </lastName></name>
<name> <firstName> . . . . . </firstName> <lastName> . . . . . </lastName></name>
<!ELEMENT name (firstName?, lastName))<!ELEMENT name (firstName?, lastName))
DTD XML
<!ELEMENT person (name, phone*))<!ELEMENT person (name, phone*))
sequence
optional
<!ELEMENT person (name, (phone|email)))<!ELEMENT person (name, (phone|email)))
Kleene star
alternation
<person> <name> . . . . . </name> <phone> . . . . . </phone> <phone> . . . . . </phone> <phone> . . . . . </phone> . . . . . .</person>
<person> <name> . . . . . </name> <phone> . . . . . </phone> <phone> . . . . . </phone> <phone> . . . . . </phone> . . . . . .</person>
![Page 24: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/24.jpg)
Attributes in DTDs
<!ELEMENT person (ssn, name, office, phone?)><!ATTLIST person age CDATA #REQUIRED>
<!ELEMENT person (ssn, name, office, phone?)><!ATTLIST person age CDATA #REQUIRED>
<person age=“25”> <name> ....</name> ...</person>
<person age=“25”> <name> ....</name> ...</person>
![Page 25: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/25.jpg)
Attributes in DTDs
<!ELEMENT person (ssn, name, office, phone?)><!ATTLIST person age CDATA #REQUIRED id ID #REQUIRED manager IDREF #REQUIRED manages IDREFS #REQUIRED>
<!ELEMENT person (ssn, name, office, phone?)><!ATTLIST person age CDATA #REQUIRED id ID #REQUIRED manager IDREF #REQUIRED manages IDREFS #REQUIRED>
<person age=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”> <name> ....</name> ...</person>
<person age=“25” id=“p29432” manager=“p48293” manages=“p34982 p423234”> <name> ....</name> ...</person>
![Page 26: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/26.jpg)
Attributes in DTDs
Types:
• CDATA = string
• ID = key
• IDREF = foreign key
• IDREFS = foreign keys separated by space
• (Monday | Wednesday | Friday) = enumeration
![Page 27: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/27.jpg)
Attributes in DTDs
Kind:
• #REQUIRED
• #IMPLIED = optional
• value = default value
• value #FIXED = the only value allowed
![Page 28: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/28.jpg)
Using DTDs
• Must include in the XML document• Either include the entire DTD:
– <!DOCTYPE rootElement [ ....... ]>
• Or include a reference to it:– <!DOCTYPE rootElement SYSTEM
“http://www.mydtd.org”>
• Or mix the two... (e.g. to override the external definition)
![Page 29: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/29.jpg)
DTDs Aren’t Expressive Enough
DTDs capture grammatical structure, but have some drawbacks:
•Not themselves in XML – inconvenient to build tools for them
•Don’t capture database datatypes’ domains
•IDs aren’t a good implementation of keys
•No way of defining OO-like inheritance
![Page 30: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/30.jpg)
XML Schema
Aims to address the shortcomings of DTDs
• XML syntax
• Can define keys using XPaths
• Subclassing
• Domains and built-in datatypes
![Page 31: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/31.jpg)
Basics of XML SchemaNeed to use the XML Schema namespace
(generally named xsd)
• simpleTypes are a way of restricting domains on scalars
• Can define a simpleType based on integer, with values within a particular range
• complexTypes are a way of defining element/attribute structures
• Basically equivalent to !ELEMENT, but more powerful
• Specify sequence, choice between child elements
• Specify minOccurs and maxOccurs (default 1)
• Must associate an element/attribute with a simpleType, or an element with a complexType
![Page 32: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/32.jpg)
Simple Schema Example
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:element name=“mastersthesis" type=“ThesisType"/> <xsd:complexType name=“ThesisType">
<xsd:attribute name=“mdate" type="xsd:date"/><xsd:attribute name=“key" type="xsd:string"/><xsd:attribute name=“advisor" type="xsd:string"/><xsd:sequence>
<xsd:element name=“author" type=“xsd:string"/> <xsd:element name=“title" type=“xsd:string"/> <xsd:element name=“year" type=“xsd:integer"/> <xsd:element name=“school" type=“xsd:string”/> <xsd:element name=“committeemember" type=“CommitteeType” minOccurs=“0"/>
</xsd:sequence> </xsd:complexType> </xsd:schema>
![Page 33: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/33.jpg)
Querying XML Data• Querying XML has two components
– Selecting data• pattern matching on structural & path properties• typical selection conditions
– Construct output, or transform data• construct new elements• restructure• order
![Page 34: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/34.jpg)
Querying XML Data
• XPath = simple navigation through the tree
• XQuery = the SQL of XML – next week
• XSLT = recursive traversal– will not discuss in class
![Page 35: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/35.jpg)
Querying XMLHow do you query a directed graph? a tree?
The standard approach used by many XML, semistructured-data, and object query languages:
•Define some sort of a template describing traversals from the root of the directed graph
•In XML, the basis of this template is called an XPath
![Page 36: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/36.jpg)
XPath is widely used
•XML Schema uses simple XPaths in defining keys and uniqueness constraints
•XQuery
•XSLT
•XLink and XPointer, hyperlinks for XML
![Page 37: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/37.jpg)
XPathsIn its simplest form, an XPath is like a path in a file system:
/mypath/subpath/*/morepath
•The XPath returns a node set representing the XML nodes (and their subtrees) at the end of the path
•XPaths can have node tests at the end, returning only particular node types, e.g., text(), element(), attribute()
•XPath is fundamentally an ordered language: it can query in order-aware fashion, and it returns nodes in order
![Page 38: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/38.jpg)
Sample Data for Queries
<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>
</bib>
<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>
</bib>
![Page 39: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/39.jpg)
Data Model for XPath
bib
book book
publisher author . . . .
Addison-Wesley Serge Abiteboul
The root
The root element
![Page 40: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/40.jpg)
46
XPath/bib/book/year/bib/book/year
/bib/paper/year/bib/paper/year
//author//author
/bib//first-name/bib//first-name
//author/*//author/*
/bib/book/@price/bib/book/@price
/bib/book/author[firstname]/bib/book/author[firstname]
/bib/book/author[firstname][address[.//zip][city]]/lastname/bib/book/author[firstname][address[.//zip][city]]/lastname
![Page 41: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/41.jpg)
XPath: Simple Expressions
Result: <year> 1995 </year>
<year> 1998 </year>
Result: empty (there were no papers)
/bib/book/year/bib/book/year
/bib/paper/year/bib/paper/year
![Page 42: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/42.jpg)
XPath: Restricted Kleene Closure
Result:<author> Serge Abiteboul </author>
<author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
<author> Victor Vianu </author>
<author> Jeffrey D. Ullman </author>
Result: <first-name> Rick </first-name>
//author//author
/bib//first-name/bib//first-name
![Page 43: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/43.jpg)
Xpath: Text Nodes
Result: Serge Abiteboul
Victor Vianu
Jeffrey D. Ullman
Rick Hull doesn’t appear because he has firstname, lastname
Functions in XPath:– text() = matches the text value– node() = matches any node (= * or @* or text())– name() = returns the name of the current tag
/bib/book/author/text()/bib/book/author/text()
![Page 44: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/44.jpg)
Xpath: Wildcard
Result: <first-name> Rick </first-name>
<last-name> Hull </last-name>
* Matches any element
//author/*//author/*
![Page 45: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/45.jpg)
Xpath: Attribute Nodes
Result: “55”
@price means that price has to be an attribute
/bib/book/@price/bib/book/@price
![Page 46: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/46.jpg)
Xpath: Predicates
Result: <author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
/bib/book/author[firstname]/bib/book/author[firstname]
![Page 47: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/47.jpg)
Xpath: More Predicates
Result: <lastname> … </lastname>
<lastname> … </lastname>
/bib/book/author[firstname][address[.//zip][city]]/lastname/bib/book/author[firstname][address[.//zip][city]]/lastname
![Page 48: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/48.jpg)
Xpath: More Predicates
/bib/book[@price < 60]/bib/book[@price < 60]
/bib/book[author/@age < 25]/bib/book[author/@age < 25]
/bib/book[author/text()]/bib/book[author/text()]
![Page 49: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/49.jpg)
Xpath: Summarybib matches a bib element
* matches any element
/ matches the root element
/bib matches a bib element under root
bib/paper matches a paper in bib
bib//paper matches a paper in bib, at any depth
//paper matches a paper at any depth
paper | book matches a paper or a book
@price matches a price attribute
bib/book/@price matches price attribute in book, in bib
bib/book/[@price<“55”]/author/lastname matches…
![Page 50: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/50.jpg)
Axes: More Complex Traversals
Thus far, we’ve seen XPath expressions that go down the tree
• But we might want to go up, left, right, etc.
• These are expressed with so-called axes:
• self::path-step
• child::path-step parent::path-step
• descendant::path-step ancestor::path-step
• descendant-or-self::path-step ancestor-or-self::path-step
• preceding-sibling::path-step following-sibling::path-step
• preceding::path-step following::path-step
• The previous XPaths we saw were in “abbreviated form”
![Page 51: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/51.jpg)
57
Overview of Research issues• Data modeling and normalization
• Query language design
• Storage & publishing of XML– XML → Relations– Relations → XML
• Theoretical work– expressiveness– containment, type checking
• Query execution & optimization
![Page 52: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/52.jpg)
XPath containment• XPath expressions return sets of nodes
– P1(doc) = node set
• P1 P2 if P1(doc) P2(doc) for all doc⊆ ⊆
• Limited features /, //, *, [ ]
• XPath expressions = tree patterns
/a[a]//*[b]//c
![Page 53: Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives](https://reader030.vdocuments.us/reader030/viewer/2022032804/56649e495503460f94b3bf5b/html5/thumbnails/53.jpg)
Deciding containment by tree matching
http://www.ifis.uni-luebeck.de/projects/XPathContainment/containmentFrame.htm
Deciding containment for simple XPath expressions in coNP-complete
Implementation: