using semantics in xml data management
DESCRIPTION
Using Semantics in XML Data Management. Tok Wang Ling Department of Computer Science National University of Singapore Gillian Dobbie Department of Computer Science University of Auckland. Roadmap. XML documents and current XML schema languages - PowerPoint PPT PresentationTRANSCRIPT
April 9, 2007 SWIIS, Bangkok 1
Using Semantics in XML Using Semantics in XML Data ManagementData Management
Tok Wang LingDepartment of Computer Science
National University of Singapore
Gillian DobbieDepartment of Computer Science
University of Auckland
April 9, 2007 SWIIS, Bangkok 2
RoadmapRoadmap
1. XML documents and current XML schema languages
2. ORA-SS (Object-Relationship-Attribute model for Semi-Structured data) [6]
3. The applications of ORA-SS• Semantic query optimization in XML
4. Conclusion
[6]. T. W. Ling, M. L. Lee, G. Dobbie. Semistructured Database Design. Springer Science+Business media, Inc. 2005
April 9, 2007 SWIIS, Bangkok 3
RoadmapRoadmap
1. XML documents and current XML schema languages
2. ORA-SS (Object-Relationship-Attribute model for Semi-Structured data)
3. The applications of ORA-SS• Semantic query optimization in XML
4. Conclusion
April 9, 2007 SWIIS, Bangkok 4
1. XML – Brief introduction 1. XML – Brief introduction • XML (eXtensible Markup Language) is
– Released by W3C– An application of SGML– A promising standard of data publishing, integrating and
exchanging on the web• XML schemas
– DTD (Data Type Definition) [4]– XSD (XML Schema Definition), W3C recommended standard
[8, 9, 10]
[4]. Extensible Markup Language (XML) 1.0 (3rd Edition). W3C Recommendation 04 February 2004. http://www.w3.org/TR/2004/REC-xml-20040204/[8]. XML Schema Part 0: Primer Second Edition. W3C Recommendation 28 October 2004. http://www.w3.org/TR/2004/REC-xmlschema-0-20041028/ [9]. XML Schema Part 1: Structures Second Edition. W3C Recommendation 28 October 2004. http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/[10]. XML Schema Part 2: Datatypes Second Edition. W3C Recommendation 28 October 2004. http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/
April 9, 2007 SWIIS, Bangkok 5
1. XML – A motivating example1. XML – A motivating example
• Suppose we have an XML document “psj.xml” about different parts, suppliers and projects, where– The document has a root element psj;– Under psj, there is a sequence of part elements;– Under part, there is a sequence of supplier elements;– Under supplier, there is a sequence of project
elements.
April 9, 2007 SWIIS, Bangkok 6
Example 1. psj.xml<?xml version="1.0" encoding="UTF-8"?><psj xmlns:xsi="…" xsi:noNamespaceSchemaLocation="…"><part> <pno>P001</pno> <pname>Nut</pname> <color>Silver</color> <supplier> <sno>S001</sno> <sname>Alfa</sname> <city>Atlanta</city> <price>5</price> <project> <jno>J001</jno> <jname>Rocket boots</jname> <budget>20000</budget> <qty>60</qty> </project> <project> <jno>J003</jno> <jname>Firework launcher</jname> <budget>250000</budget> <qty>650</qty> </project> </supplier> <supplier> <sno>S002</sno> <sname>Beta</sname> <city>Atlanta</city> <city>New York</city> <price>5.5</price> <project> <jno>J002</jno> <jname>Diving helm</jname> <budget>18000</budget> <qty>70</qty> </project> <project> <jno>J003</jno> <jname>Firework launcher</jname> <budget>250000</budget> <qty>50</qty> </project> </supplier></part>…
…<part> <pno>P002</pno> <pname>Nut</pname> <color>Copper</color> <supplier> <sno>S001</sno> <sname>Alfa</sname> <city>Atlanta</city> <price>4.6</price> <project> <jno>J002</jno> <jname>Diving helm</jname> <budget>18000</budget> <qty>60</qty> </project> </supplier> <supplier> <sno>S003</sno> <sname>Beta</sname> <city>New York</city> <price>5</price> <project> <jno>J001</jno> <jname>Rocket boots</jname> <budget>20000</budget> <qty>20</qty> </project> <project> <jno>J004</jno> <jname>Blue fireworks</jname> <budget>20000</budget> <qty>50</qty> </project> </supplier></part></psj>
Figure 1. Example XML document
April 9, 2007 SWIIS, Bangkok 7
1. XML – the DTD of the “psj.xml”1. XML – the DTD of the “psj.xml”
<?xml version="1.0" encoding="UTF-8"?><!--DTD generated by XXX--><!ELEMENT psj (part+)> <!ELEMENT part (pno, pname, color, supplier+)> <!ELEMENT pno (#PCDATA)> <!ELEMENT pname (#PCDATA)> <!ELEMENT color (#PCDATA)> <!ELEMENT supplier (sno, sname, city+, price, project+)> <!ELEMENT sno (#PCDATA)> <!ELEMENT sname (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT price (#PCDATA)> <!ELEMENT project (jno, jname, budget, qty)> <!ELEMENT jno (#PCDATA)> <!ELEMENT jname (#PCDATA)> <!ELEMENT budget (#PCDATA)> <!ELEMENT qty (#PCDATA)>
▼♦ psj ▼♦ part ♦ pno ♦ pname ♦ color ▼♦ supplier ♦ sno ♦ sname ♦ city ♦ price ▼♦ project ♦ jno ♦ jname ♦ budget ♦ qty
(a) “psj.dtd”, The DTD of the “psj.xml” (b) psj.dtd in Data Guide
Figure 2. DTD and DataGuide of Example XML document
April 9, 2007 SWIIS, Bangkok 8
1. XML – what the DTD says1. XML – what the DTD says• DTD is a simple definition of an XML document, where users can
define– Element/Attribute types– Occurrence constraints (e.g. ?, +, *)– Containment among different element types (the structure)
• DTD cannot express– Occurrence constraints in numbers (e.g. 2 to 8)– Uniqueness/Key constraints on a combination of attributes/elements (ID
attribute can be only assigned on one attribute at a time in DTD.)– Relationship types among elements and their degrees – Difference between the attribute (or simple element) of element type and
the attribute (or simple element) of relationship type.
Simple elements are those element types with PCDATA only without any attribute types.
April 9, 2007 SWIIS, Bangkok 9
1. XML – XSD 1. XML – XSD <xs:schema xmlns:xs = “…”><xs:element name = “psj”> <xs:complexType> <xs:sequence> <xs:element name="part"> <xs:complexType> <xs:sequence> <xs:element name="pno" type="xs:string"/> <xs:element name="pname" type=" xs:string"/> <xs:element name="color" type=" xs:string"/> <xs:element name="supplier" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="sno" type=" xs:string"/> <xs:element name="sname" type=" xs:string"/> <xs:element name="city" type=" xs:string“ maxOccurs="unbounded"/> <xs:element name="price" type=" xs:string"/> <xs:element name="project" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="jno" type=" xs:string"/> <xs:element name="jname" type=" xs:string"/> <xs:element name="budget" type=" xs:string"/> <xs:element name="qty" type=" xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> <xs:key name="PK"> <xs:selector xpath="part"/> <xs:field xpath="pno"/> </xs:key></xs:element></xs:schema>
“psj.xsd”, the XSD schema of the motivating example data.
XSD definition of element occurrence constraint
XSD definition of key constraint, which requires that all part element should have a non-nil pno element and the value of all pno elements in the document should be unique.
Figure 3. XML Schema of Example XML document
April 9, 2007 SWIIS, Bangkok 10
1. XML – what XSD can tell1. XML – what XSD can tell
• XSD is the standard of XML schema definition, recommended by W3C and supported by most vendors, which– has extensible XML syntax, – supports more data types (user-defined type and 37
built-in types)– is able to represent uniqueness/key for both attribute
types and element types.– And has many other improvements in comparison
with DTD.
April 9, 2007 SWIIS, Bangkok 11
1. XML – XSD still flaws1. XML – XSD still flaws
1. A key constraint is specified by a key element. The key constraints in XSD is an extension of ID in DTD. It is totally different to the key constraint in relational databases.
– E.g. In the previous XSD, the values of key attribute, pno of part, should be unique within the set of the part elements in the whole document.
– Therefore, when an element type is located in a lower level such as supplier and project, XSD cannot declare sno and jno as their key attributes (OIDs) respectively.
XSD is not sufficient in expressing the relational semantics in XML data, such as:
April 9, 2007 SWIIS, Bangkok 12
1. XML – XSD still flaws 1. XML – XSD still flaws (cont.)(cont.)
- The key element must contain the following (in order):a) One and only one selector element
- contains an XPath expression that specifies the set of elements across which the values specified by the field must be unique
b) One or more field elements - contain an XPath expressions that specifies the values
must be unique for the set of elements specified by the selector element.
- The key constraint is similar to the unique constraint, except that the column on which a unique constraint is defined can have null values.
April 9, 2007 SWIIS, Bangkok 13
1. XML – XSD still flaws 1. XML – XSD still flaws (Cont.)(Cont.)
2. XSD does not support relationship types and other relational semantic constraints.
– E.g. The ternary relationship type psj among part, supplier and project in the original data is lost in the XSD.
3. XSD cannot distinguish attributes (or simple elements) of relationship types from those attributes (or simple elements) of element types.
– E.g. Price is an attribute of the binary relationship type ps between part and supplier. However, it looks the same as sname, an attribute (simple element) of the element supplier.
April 9, 2007 SWIIS, Bangkok 14
RoadmapRoadmap
1. XML documents and current XML schema languages
2. ORA-SS (Object-Relationship-Attribute model for Semi-Structured data)
3. The applications of ORA-SS• Semantic query optimization in XML
4. Conclusion
April 9, 2007 SWIIS, Bangkok 15
2. 2. ORA-SS in a nutshellORA-SS in a nutshell• ORA-SS is a semantics rich data model for semi-
structured data.• It can easily represent the relational semantics
and constraints in XML data.• ORA-SS model is also a bridge that connects the
tree structure of XML and the semantics in relational and object-relational databases.
• In comparison with traditional ER diagram, ORA-SS schema diagram represents the hierarchical structure of XML data.
April 9, 2007 SWIIS, Bangkok 16
2. ORA-SS in a nutshell2. ORA-SS in a nutshell
• A complete ORA-SS model has 4 diagrams– Schema diagram
• Represents the structure and constrains (business rules) on XML documents
– Instance diagram• Visually represents the graphical structure of XML data
– Functional dependency diagram• Represents FDs in relationship types
– Inheritance diagram• Represents the specialization/generalization relationships among
different object classes in ORA-SS
April 9, 2007 SWIIS, Bangkok 17
2. 2. ORA-SS data modelsORA-SS data models• Object class
– attributes of object class– ordering on object class
• Relationship Type– degree of relationship type– participating object classes in relationship type– attributes of relationship type– disjunctive relationship type– recursive relationship type– ID dependent relationship type
April 9, 2007 SWIIS, Bangkok 18
2. ORA-SS data models 2. ORA-SS data models (Cont.)(Cont.)
• Attribute– attributes of object class or relationship type– key attribute (OID)– foreign key / referential constraint (IDREF/IDREFS)– composite attribute– disjunctive attribute– attribute with unknown structure– ordering on attributes– fixed or default value of attribute– derived attribute
April 9, 2007 SWIIS, Bangkok 19
The ORA-SS schema diagram of Example 1.
Part, supplier and project are modeled as object classes.
Pno, sno and jno are declared as the object ID of part, supplier and project respectively.
Price is an attribute of the relationship type PS;and qty is an attribute of PSJ.
PS is a binary relationship type between part and supplier,
PSJ is a ternary relationship type defined among part, supplier and project
part
project
supplierpno pname
sno sname
jno jname
price
qty
PS, 2, +, +
PSJ, 3, +, +PS
PSJ
budget
city
color
+
Figure 4. ORA-SS schema diagram of Example XML document
April 9, 2007 SWIIS, Bangkok 20
ORA-SS – Semantic AdvantagesORA-SS – Semantic Advantages
• ORA-SS can represent the following semantics that DTD and XMLSchema cannot:– Attribute vs. object class– Multi-valued attribute vs. object class– Identifier (ID)– IDREF or Foreign Key– n-ary relationship type– Attribute of object class vs. attribute of
relationship type– View of XML document
April 9, 2007 SWIIS, Bangkok 21
RoadmapRoadmap
1. XML documents and current XML schema languages
2. ORA-SS (Object-Relationship-Attribute model for Semi-Structured data)
3. The applications of ORA-SS• Semantic query optimization in XML
4. Conclusion
April 9, 2007 SWIIS, Bangkok 22
3. 3. ORA-SS applicationsORA-SS applications• Due to the rich semantics in ORA-SS, the model can be
widely used in– Normal form XML schema– Relational/object-relational storage of XML data– XML schema/data integration– XML query optimization [12]– XML aggregates evaluation– XML view creation and validation [2]– XML graphical query language and output [7]– XML keyword search [13]– etc.
[2]. Y. B. Chen, T. W. Ling, M. L. Lee. Designing Valid XML Views. ER2002, Tampere, Finland. Oct 7-11, 2002[7]. W. Ni, T. W. Ling. GLASS: A Graphical Query Language for Semi-Structured Data. DASFAA 2003.[12]. H. Wu, T. W. Ling, B. Chen. VERT: a semantic approach for content search and content extraction in XML query processing. Submitted to ER’07[13]. B. Chen, J. Lu, T. W. Ling. ICRA: effective semantics for ranked XML keyword search. Submitted to VLDB’07.
We will illustrate these with in details
April 9, 2007 SWIIS, Bangkok 23
• The semantic information represented in ORA-SS is helpful in optimizing XML query.– There are many algorithms proposed for XML query
optimization, e.g. TwigStack [1] and its variants.– When ORA-SS semantics of the data are known, they
can be taken into account for query optimization.
[1]. Nicolas Bruno, Nick Koudas, and Divesh Srivastava. Holistic Twig Joins: optimal XML Pattern Matching. SIGMOD Conference, 2002.
Semantic query optimizationSemantic query optimization3. ORA-SS applications3. ORA-SS applications
April 9, 2007 SWIIS, Bangkok 24
Semantic Semantic query optimizationquery optimization3. ORA-SS applications3. ORA-SS applications
• Traditional processing should scan the whole XML document, checking every project with jno=“J001” and finding all corresponding budget values.
• However, in ORA-SS, since jno is the object ID and we have the functional dependecny:
jno budget so the optimized processing only need to find the first project instance with jno=“J001” and return the corresponding budget value.
Example: Consider the following simple query example which means,
(Query 1) To display the budget of project “J001”.
//project [jno = “J001”]/budget
April 9, 2007 SWIIS, Bangkok 25
• Most existing algorithms focus on structural search of twig pattern queries
• Few of them pay high attentions on content search for values of elements.
• They treat content nodes (or values) the same as element nodes
• Disadvantages: – Too many label streams of contents – Difficult to find the actual values of labels as output solutions
• We propose VERT (Value Extraction with Relational Table)
Semantic query optimization –Semantic query optimization – Content SearchContent Search 3. ORA-SS applications3. ORA-SS applications
April 9, 2007 SWIIS, Bangkok 26
• Idea of VERT:1. Introduce relational tables to store document
values instead of treating them as nodes and labeling them.
2. Rewrite and optimize XML twig queries based on underlining relational tables.
3. Further optimize relational tables for query processing if more semantic information is available (i.e. more semantics better optimization).
3. ORA-SS applications3. ORA-SS applications
Semantic query optimization –Semantic query optimization – Content Search Content Search
April 9, 2007 SWIIS, Bangkok 27
1. Introduce relational tables to store document values instead of treating them as nodes and labeling them.
E.g. the values for price (title, etc) of XML tree in Figure 5 can be stored with the labels of price (title, etc) elements in Figure 6.
3. ORA-SS applications3. ORA-SS applications
Figure 5. Example XML document 2 Figure 6. Example VERT tables
Semantic query optimization –Semantic query optimization – Content Search Content Search
April 9, 2007 SWIIS, Bangkok 28
2. Rewrite and optimize XML twig queries based on underlining relational tables.
e.g.– Rewrite the twig query in Figure 7(a) to the twig in Figure 7(b)– Execute SQL in table Rprice of Figure 6 to get all labels of price
elements with value greater than 15 and form the stream Tprice>15
– Perform structural joins based on these labels for price elements (i,e.Tprice>15 ) with book and ISBN elements
book
ISBN Price >15
3. ORA-SS applications3. ORA-SS applications
Benefits:• Save stream merging of all price
elements with values > 15• Save structural join between
price elements and their valuesFigure 7. Example twig query
(a) Twig query (b) rewritten query
book
ISBN
price
>15
Semantic query optimization –Semantic query optimization – Content Search Content Search
April 9, 2007 SWIIS, Bangkok 29
3. Further optimize relational tables for query processing if some more semantic information is available (i.e. more semantics better optimization).
Optimization 1 (VERT-1): put the value of price (title, etc) with labels of book objects since price (title) is a property of book object class according to semantics captured in ORA-SS (shown in Figure 8).
3. ORA-SS applications3. ORA-SS applications
Benefit:Further save structural joins between price and book & between ISBN and book for query in Figure 7
Figure 8. VERT tables with optimization 1
Semantic query optimization –Semantic query optimization – Content Search Content Search
April 9, 2007 SWIIS, Bangkok 30
3. Further optimize relational tables for query processing if some more semantic information is available (i.e. more semantics better optimization).
Optimization 2 (VERT-2): pre-merge the tables of title, price, etc. in Figure 8 if we further know they are single-valued attributes of book object class according to semantics in ORA-SS (shown in Figure 9). (Note: should not merge multi-valued attribute, author.)
3. ORA-SS applications3. ORA-SS applications
Benefit:Save expensive structure joins by using an efficient selection on the table for query in Figure 7.
Figure 9. VERT tables with optimization 2
Semantic query optimization –Semantic query optimization – Content Search Content Search
April 9, 2007 SWIIS, Bangkok 31
Experimental results on three datasetsi.e. NASA, DBLP and XMark (Figure 10)
• VERT outperforms TwigStack in query processing time• VERT-2 is superior to VERT-1, which is in turn better
than original VERT.
3. ORA-SS applications3. ORA-SS applications
Figure 10. Experimental results of VERT
Semantic query optimization –Semantic query optimization – Content Search Content Search
April 9, 2007 SWIIS, Bangkok 32
• XML semantics captured in ORA-SS are crucial in correctly writing queries with aggregates
Example. Consider the query:
(Query 3.) Find the average budget of all the projects.Two potential XQuery expressions are::
XML query with XML query with aggregatesaggregates3. ORA-SS applications3. ORA-SS applications
XQ.3afor $pid in distinct_values(//project/jno)
let $bgts := //project[jno = $pid]/budget
return
<avg_bgt>{avg($bgts)} </avg_bgt>
XQ.3blet $bgts := //project/budget
return
<avg_bgt>{avg($bgts)} </avg_bgt>
April 9, 2007 SWIIS, Bangkok 33
Example - cont.
• If we know jno is the OID or key of project object class from ORA-SS, i.e.
jno budgetthen we can easily judge that XQ.3a is a correct Xquery expression while XQ3.b is incorrect as some projects may appear more times than other projects in the XML document.
• If we don’t know this semantics, it is difficult to say which XQuery expression is correct.
XML query with XML query with aggregatesaggregates3. ORA-SS applications3. ORA-SS applications
April 9, 2007 SWIIS, Bangkok 34
Define and validate Define and validate XML viewsXML views
p ar t
p r o jec t
s u p p lie rp n o p n am e
s n o s n am e
jn o jn am e
p r ic e
q ty
P S , 2 , + , +
P S J , 3 , + , +P S
P S J
b u d g et
c ity
c o lo r
+
s u pplie r
p r o jec t
pa rt
price
q ty
2
32
3
3. ORA-SS applications3. ORA-SS applications
•Valid XML views in ORA-SS•View definition operators: select, project/drop, swap, joinFor example, consider the following swapping operation that changes the position of supplier and part in different hierarchical levels:
s u pplie r
p r o jec t
pa rt price
q ty
2
3
3
Valid view Invalid viewBecause price is a relationship attribute, it cannot be moved up with supplier elements, which would be semantically meaningless in the result view. Figure 11. Example view definition 1
April 9, 2007 SWIIS, Bangkok 35
Define and validate XML viewsDefine and validate XML views
p ar t
p r o jec t
s u p p lie rp n o p n am e
s n o s n am e
jn o jn am e
p r ic e
q ty
P S , 2 , + , +
P S J , 3 , + , +P S
P S J
b u d g et
c ity
c o lo r
+
3. ORA-SS applications3. ORA-SS applications
Another example, consider the following projection operation that drops supplier from the structure:
Valid viewInvalid view
Dropping supplier makes price and qty become multi-valued attributes, and we should apply aggregation functions to get a meaningful view.
p r o jec t
pa rt
A v g _ price
T o ta l_ q ty
Figure 12. Example view definition 2
project
part
price
q ty
April 9, 2007 SWIIS, Bangkok 36
Graphical XML queryGraphical XML query based on ORA-SS based on ORA-SS3. ORA-SS applications3. ORA-SS applications
A graphical XML query language is designed on the base of ORA-SS
Figure 13. The screenshot of the user-interface of our graphical query language
The schema panel loads the ORA-SS schema diagram
Graphical query can be posed by either dragging components from the diagram in schema panel or using the construction buttons on the top of the window.
Complex query logics such as quantification, negation, IF-THEN construction can be specified in the Condition Logic Window
Query 1: To select and display the projects that do not have any suppliers located in Atlanta.
April 9, 2007 SWIIS, Bangkok 37
• Keyword search is a user-friendly way to query XML documents.
• Most existing algorithms are based on either tree data model or graph (digraph) data model of XML without the semantics.
XML keyword searchXML keyword search with semantics with semantics3. ORA-SS applications3. ORA-SS applications
April 9, 2007 SWIIS, Bangkok 38
• Tree data model (LCA [11])– Lowest Common Ancestor (LCA)
• Contains the all keywords • Has no descendant node containing all the keywords
• Graph (digraph) data model (Banks [5])– Reduced sub-tree
• A tree T in graph (digraph) containing all keywords• No proper sub-tree of T contains all keywords
• Limitations of keyword search without semantics– May have difficulty in representing results
– May return many irrelevant results
XML keyword search with semanticsXML keyword search with semantics3. ORA-SS applications3. ORA-SS applications
[5]. V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In Proc. of VLDB Conference, pages 505-516, 2005.[11] Y. Xu and Y. Papakonstantinou. Efficient keyword search for smallest LCAs in XML databases. In Proc. of SIGMOD Conference, pages 537-538, 2005.
April 9, 2007 SWIIS, Bangkok 39
Example:• Q1 = {Widom}
• LCA & reduced sub-tree give node 1.1.1
• Not enough information
XML keyword search with semanticsXML keyword search with semantics3. ORA-SS applications3. ORA-SS applications
• Q2 = {semistructured query processing} • LCA(Q2) = dblp (i.e. the whole XML database) …
overwhelming information• Reduced sub-tree results includes all papers with either
“semistructured” or “query processing”. However, not all “query processing” papers are about “semistructured”.
Figure 14. Example XML document 3
April 9, 2007 SWIIS, Bangkok 40
• Therefore, we propose ICA (Interested Common Ancestor) and IRA (Interested Related Ancestors) to exploit the semantics for ranked keyword search.
• Ideas:1. DBA Defines the set of interested object classes and the
conceptual connections between objects.
e.g. in DBLP publications and author can be the interested object classes; the reference/citations can be one type of conceptual connection between publications.
Note: we can group all publications for each author object.
XML keyword search with semanticsXML keyword search with semantics3. ORA-SS applications3. ORA-SS applications
April 9, 2007 SWIIS, Bangkok 41
• Ideas:2. The results of a keyword query include interested objects
based on ICA and IRA semantics.– The results of ICA (Interested Common Ancestor) include all
objects that each contains all query keywords– The results of IRA (Interested Related Ancestors) include all
object pairs (o, o’) such that – the pair together contain all keywords AND– o and o’ are conceptually connected.
Note: we output a list of IRA objects instead of IRA pairs.
Intuitive meaning for IRA:
For query “semistructured query processing”, if a paper P with title “query processing” cites or is cited by a paper with title “semistructured”, then P is considered related to the query; at least it is a better result than “query processing” papers that do not cite or are cited by “semistructured” papers.
XML keyword search with semanticsXML keyword search with semantics3. ORA-SS applications3. ORA-SS applications
April 9, 2007 SWIIS, Bangkok 42
• Ideas:3. The system automatically ranks result objects based on
the following metrics for output.– RelevanceRank:
Intuitive meaning: – for query “semistructured query processing”, – given two papers P1 and P2 containing “query processing”, – if P1 cites or is cited by many “semistructured” papers whereas P2
cites or is cited by few “semistructured” papers, then P1 is considered more relevant to the query.
– Keyword Proximity Ranks (ProxRank):– Intuition: The less the number of elements in one object that
directly contain all keywords, the better result the object is.
XML keyword search with semanticsXML keyword search with semantics3. ORA-SS applications3. ORA-SS applications
April 9, 2007 SWIIS, Bangkok 43
Experimental evaluation based on DBLP
XML keyword search with semanticsXML keyword search with semantics3. ORA-SS applications3. ORA-SS applications
• Our approach outperforms most existing academic demos in both execution time and result quality
Figure 15. Execution time
Figure 16. Comparisons of relevant result in top-10, 20, 30 answers among academic demos
April 9, 2007 SWIIS, Bangkok 44
Experimental evaluation based on DBLP
XML keyword search with semanticsXML keyword search with semantics3. ORA-SS applications3. ORA-SS applications
• Our approach is comparable or superior to commercial systems, Google Scholar and Microsoft Libra, in term of result quality even though they can search in much more web data.
Figure 17. Comparisons of relevant result in top-10, 20, 30 answers with commercial systems
April 9, 2007 SWIIS, Bangkok 45
A demo prototype of our keyword search system on DBLP data is available at
http://xmldb.ddns.comp.nus.edu.sg
XML keyword search with semanticsXML keyword search with semantics3. ORA-SS applications3. ORA-SS applications
Figure 18. User interface of the demo system
April 9, 2007 SWIIS, Bangkok 46
RoadmapRoadmap
1. XML documents and current XML schema languages
2. ORA-SS (Object-Relationship-Attribute model for Semi-Structured data)
3. The applications of ORA-SS• Semantic query optimization in XML
4. Conclusion
April 9, 2007 SWIIS, Bangkok 47
4. 4. ConclusionConclusion
1. We demonstrate a data-centric XML document and show the limitations of current XML schema standard in represent relational semantics and constraints.
April 9, 2007 SWIIS, Bangkok 48
4. Conclusion4. Conclusion
2. We have shown that semantics in XML data are crucial in many applications, such as
• XML query optimization • XML query optimization for content search• XML aggregate computation• XML view creation and validation• XML graphical query language and output• XML keyword search• etc.
April 9, 2007 SWIIS, Bangkok 49
4. Conclusion4. Conclusion
3. Many semantic information of XML data can be expressed in ORA-SS, which is a semantics rich data model, but not in DTD or XML Schema.
April 9, 2007 SWIIS, Bangkok 50
References:References:[1] Nicolas Bruno, Nick Koudas, and Divesh Srivastava. Holistic Twig Joins: optimal XML Pattern Matching.
SIGMOD Conference, 2002.[2]. Y. B. Chen, T. W. Ling, M. L. Lee. Designing Valid XML Views. ER2002, Tampere, Finland. Oct 7-11, 2002[3]. C. J. Date. An Introduction to Database Systems. 3rd edition, Addison-Wesley Publishing Company (1981).[4]. Extensible Markup Language (XML) 1.0 (3rd Edition). W3C Recommendation 04 February 2004.
http://www.w3.org/TR/2004/REC-xml-20040204/[5]. V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion
for keyword search on graph databases. In Proc. of VLDB Conference, pages 505-516, 2005.[6]. T. W. Ling, M. L. Lee, G. Dobbie. Semistructured Database Design. Springer Science+Business media, Inc.
2005[7]. W. Ni, T. W. Ling. GLASS: A Graphical Query Language for Semi-Structured Data. DASFAA 2003.[8]. XML Schema Part 0: Primer Second Edition. W3C Recommendation 28 October 2004.
http://www.w3.org/TR/2004/REC-xmlschema-0-20041028/ [9]. XML Schema Part 1: Structures Second Edition. W3C Recommendation 28 October 2004.
http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/[10]. XML Schema Part 2: Data types Second Edition. W3C Recommendation 28 October 2004.
http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/ [11] Y. Xu and Y. Papakonstantinou. Efficient keyword search for smallest LCAs in XML databases. In Proc. of
SIGMOD Conference, pages 537-538, 2005.[12]. H. Wu, T. W. Ling, B. Chen. VERT: a semantic approach for content search and content extraction in XML
query processing. Submitted to ER’07[13]. B. Chen, J. Lu, T. W. Ling. ICRA: effective semantics for ranked XML keyword search. Submitted to
VLDB’07.
April 9, 2007 SWIIS, Bangkok 51
Q & AQ & A
April 9, 2007 SWIIS, Bangkok 52
The EndThe End