unit 04 : w3c and xpath comp 5323 web database technologies and applications 2014

62
Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Upload: albert-lesley-miles

Post on 11-Jan-2016

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Unit 04 : W3C and Xpath

COMP 5323Web Database Technologies and

Applications 2014

Page 2: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

• This PowerPoint is prepared for educational purpose and is strictly used in the classroom lecturing.

• We have adopted the "Fair Use" doctrine in this PowerPoint which allows limited copying of copyrighted works for educational and research purposes.

Doctrine of Fair Use

Page 3: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Learning Objectives

• Learn more about W3C• Understand the XML query language

Page 4: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

CypherRDB

XQL XQuery XML

SQL RDB

Neo4j

Page 5: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Outline

1. W3C2. XPath3. XQuery

Page 6: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

1 W3C

Page 7: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Overview

• The W3C – Who they are, their core beliefs, their long term goals, their members

• The W3C – Who they influence, their business processes and recommendations

• The relationship between W3C and open standards

Page 8: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

World Wide Web Consortium (W3C)

• The World Wide Web Consortium (W3C) is an international consortium where members and staff work together to develop many different web standards.

• Their mission is to lead the World Wide Web to its full potential by developing protocols and guidelines that ensure long-term growth for the Web.

Page 9: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

WC3: Founder• Tim Berners-Lee, W3C Director

and inventor of the World Wide Web in 1984.

• Served as W3C Director since 1994 when the organization was founded.

• “W3C members work together to design web technologies that build upon its versatility, giving the world the power to enhance communication and commerce for anyone, anywhere, anytime, and using any device.”

Tim Berners-Lee

Page 10: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Semantic Web• The Semantic Web is a collaborative movement led

by international standards body the World Wide Web Consortium (W3C).

• The standard promotes common data formats on the World Wide Web.

• By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web, dominated by unstructured and semi-structured documents into a "web of data".

• The Semantic Web stack builds on the W3C's Resource Description Framework (RDF).

http://en.wikipedia.org/wiki/Semantic_Web

Page 11: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

W3C: Core Beliefs• “W3C believes that in

order for the Web to reach its full potential, the most fundamental Web technologies must be compatible with one another and allow any hardware and software users to access the Web to work together.” (www.w3.org)

• One of their main goals is to have “Web Interoperability”

Page 12: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

W3C: Long Term Goals for the Web• Web for Everyone

– Make the Web available regardless of hardware, software, language, culture, etc.

• Web on Everything– Make Web access from any

kind of device as simple and convenient as possible

• Knowledge Base– Enable people to solve

problems that would be otherwise too complex or tedious to solve

• Trust and Confidence– Make accountability, security,

confidence and confidentiality possible for all users

Page 13: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

W3C: Members• W3C is comprised of more than

400 members including the world’s foremost technology companies such as Hewlett Packard, IBM, Nokia, Microsoft, AT&T, Intel, Oracle, and Xerox.

• W3C allows its members to lead the web to its full potential by allowing them to take leadership roles, promote their image as innovators, and gain early insight to market trends.

Page 14: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

W3C’s Influence• Most influential of all organizations in the

development and maintenance of the World Wide Web

• W3C has no legal authority to enforce its recommendations because membership is voluntary

• Members often follow recommendations because it helps set standards for the Web which in turn benefits each member

Page 15: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

W3C’s Influence• The W3C has made over 90

recommendations since its start in 1994.

• W3C operations are administered by offices in Japan, France, and the United States.

• As more corporations join, W3C’s recommendations will become the standard for the WWW and thus make it easier for both corporations and the public.

Page 16: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Recommendations• W3C published the Speech Synthesis Markup Language

(SSML)1.0 – Works on synthesized speech in Web interactions. – For example, how would you pronounce “1/2”?

• It could be February 2nd, one half, or 1 divided by 2.

• XML-binary Optimized Packaging (XOP) – “XML-binary Optimized Packaging (XOP) provides a standard

method for applications to include binary data, as is, along with an XML document in a package. As a result, applications need less space to store the data and less bandwidth to transmit it.” (Business Wire)

Page 17: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Business Processes• W3C’s work attempts to standardize the Web.• Each member contributes to the process with

decisions being made through community consensus.

• Each member has the same decision power no matter what size they are.

• If a general consensus can’t be reached, decisions are made on a majority basis.

Page 18: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

W3C and Open Standards

• “W3C seeks to avoid market fragmentation and thus Web fragmentation by publishing open standards for Web languages and protocols.” (www.w3.org)

• To achieve the goal of one Web, specifications for the Web's formats and protocols must be compatible with one another and allow any hardware and software used to access the Web to work together – thus W3C designs and promotes interoperable open formats and protocols to avoid market fragmentation.

Page 19: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Open Standards Guidelines• Transparency

– A public process with public access to all information• Relevance

– Start based on due analysis and market needs for all• Openness

– Anybody can participate: users and developers; industry and research; governments and public

• Impartial and consensus based – Guaranteed fairness and equal weight for each participant

• Availability – Free access to standard documents

• Maintenance – Testing, Revisions

Page 20: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

2 XPATH

Page 21: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

XPath• It provides a way to refer to specific parts of an XML tree• An ‘URL- like’ scheme for locating documents on local and

remote computer systems.• Primary purpose: Address ‘parts’ of an XML document,

and provide basic facilities for manipulation of strings, numbers and booleans.

• Used by other XML technologies• XSLT• Xquery Language

• http://www.w3.org/TR/xpath

Page 22: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Why XPath

• Does an XML tree look like the directory tree of the computer's file system?

Page 23: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Why XPath

• Unique identifiers are not sufficient– Assigning unique identifier to every element is

a burden– Identity of element may be unknown – Identifiers cannot handle ranges of text– May be inconvenient to identify a large

number of objects by listing their identifiers

Page 24: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Data Model• Treats an XML document as a logical tree• This tree consists of 7 nodes:

Root Node – the root of the documentElement Nodes – one for each element in the document

Unique ID’sAttribute NodesNamespace NodesProcessing Instruction Nodes (intended to carry

instructions to the application)Comment NodesText Nodes

• The tree structure is ordered and reads from top to bottom and left to right

Data Model

Page 25: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Data Model Example 1For this simple doc:

<doc><?encoding="UTF-8"?><para>Some <em>emphasis</em> here. </para><para>Some more stuff.</para></doc>

Might be represented as:root

<doc>

<?pi?> <para> <para>

text <em> text text

text

Data Model

Page 26: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Example 2(a)<?xml version="1.0" encoding="UTF-8" ?><bib>

<book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

<?xml version="1.0" encoding="UTF-8" ?><bib>

<book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

Page 27: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Example 2(b)

bib

book book

publisher author . . . .

Addison-Wesley Serge Abiteboul

The root

The root elementProcessing instruction

Comment

Page 28: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Nodes and Atomic values

• Nodes– <bib> (root element node) [note: not a root node]– <author> Victor Vianu </author> (element node)– price=“55” (attribute node)

• Atomic Values– Victor Vianu– “55”

Page 29: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Relationship of Nodes

• the book element is the parent of the title, author and year.

• the title, author, year elements are all children of the book element

• the title, author, year elements are all siblings.• the ancestors of the title element are the book

element and the bib element• descendants of the bib element are the book,

title, author.

Page 30: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Element Context

• Meaning of element can depend upon its context– <book><title>…</title></book>

<person><title>…</title></person>

• Want to search for, e.g. title of book, not title of person– XPath exploits sequential and hierarchical

context of XML to specify elements by their context (i.e. location in hierarchy)

• book/title person/title

Page 31: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Relative Path

• A relative location path consists of a sequence of one or more location steps separated by /

• Each node in that set is used as a context node for the following step • E.g. para will select children of the

current node that are of name 'para‘

• <chapter> //Current node <title>…</title> <para>…</para> //Selected <note> <para>…</para> //Not selected until note <note></chapter>

• Verbose expression is child::para

Page 32: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Absolute Path

• For some cases, a relative path is not suitable.– E.g. it may be necessary to select the title of a book, regardless of

the current context. In which the location relative to the document (as a whole) may be know, whereas the offset from the current location may not - use absolute path.

• An absolute path is similar to relative path, except for the front part start with “/” - root of document– e. g. / book/ title

• Use “//” expression, it can even possible to select all occurrences of a specific element type.– e. g. // author

Page 33: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Partial Tree of faculty.xml

student

student

d101 faculty.xml

e101faculty

a101

e102 e110 e118

name

Sciencecourse course course

a102

t101

e103 e104 e107

e105a103

p13

pid

year

2007 DB Sys

name

e106

t102

sid

s1

grade

t103

A+

e108 e109

t104

sid

s2

grade

t105

B

……….

doc("faculty.xml")/descendant::course[attribute::pid="p13"]/child::student[child::sid="s2"]

Page 34: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

What Does an XPath Expression Return?

• A sequence of result nodes with their contents in the form of an (not necessarily well formed) XML document

• The doc( ) function is used to open the “faculty.xml" file

• XPath:doc("faculty.xml")/descendant::course[attribute::pid="p13"]/child::student[child::sid="s2"]

• Result:<student> <sid>s2</sid> <grade>B</grade></student><student><sid>s2</sid><grade>D</grade>

</student>

Page 35: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Example for XPath Queries<bib>

<book> <publisher> Addison-Wesley </publisher> <author age="30"> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

<bib><book> <publisher> Addison-Wesley </publisher> <author age="30"> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

Page 36: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Xpath: Simple Expressions

/bib/book/yearResult: <year> 1995 </year> <year> 1998 </year>

/bib/paper/yearResult: empty (there were no papers)

Page 37: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Xpath: //

//authorResult: <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author>

/bib//first-nameResult: <first-name> Rick </first-name>

Page 38: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Xpath: Functions/bib/book/author/text()Result: Serge Abiteboul

Victor Vianu Jeffrey D. Ullman

Note: Rick Hull doesn’t appear because he has firstname, lastname

Functions in XPath:– text() = matches the text value– node() = matches any node { = * }– name() = returns the name of the current tag

• http://www.w3.org/TR/xquery-operators/

Page 39: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Xpath: Wildcard

//author/*

Result: <first-name> Rick </first-name> <last-name> Hull </last-name>

Note: * matches any element

Page 40: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Xpath: Attribute Nodes

/bib/book/@price

Result: “55”

@price means that price is has to be an attribute

Page 41: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Xpath: Qualifiers

/bib/book/author[first-name]/bib/book/author[last-name]Result: <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author>

/bib/book/author[1]Result: <author>Serge Abiteboul</author> <author>Jeffrey D. Ullman</author>

Page 42: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Xpath: More Qualifiers

/bib/book/author[first-name]/last-name

Result: <lastname> Hull </lastname>

Page 43: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Xpath: More Qualifiers

/bib/book[@price < “60”]Result : <book price="55">…..</book>

/bib/book[author/@age < “35”]Result: <book> <publisher>Addison-Wesle……

/bib/book[author/text()]

Page 44: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Selecting Several Paths

/bib/book/author | /bib/book/title

Result<author age="30">Serge Abiteboul</author><author> <first-name>Rick</first-name> <last-name>Hull</lastname> </author><author>Victor Vianu</author><title>Foundations of Databases</title><author>Jeffrey D. Ullman</author><title>Principles of Database and Knowledge Base Systems</title>

Page 45: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Axes

• Axis defines a node-set relative to the current node.– Indicates which nodes are included in search

• Relative to context node

– Dictates node ordering in set• Forward axes select nodes that follow context

node• Reverse axes select nodes that precede context

node

Page 46: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

XPath axes

Page 47: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Node Tests

• Node tests– define a set of nodes selected by axis

• Rely upon axis’ principle node type– Corresponds to type of node axis can select

Page 48: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Location

• The syntax for a location step is:

axisname::nodetest[predicate]

• Reference http://www.w3schools.com/xpath/xpath_axes.asp

Page 49: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Example

descendant::first-name

Result <first-name>Rick</first-name>

Page 50: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

books.xml<?xml version = "1.0"?><books> <book> <title>Java How to Program</title> <translation edition = "1">Spanish</translation> <translation edition = "1">Chinese</translation> <translation edition = "1">Japanese</translation> <translation edition = "2">French</translation> <translation edition = "2">Japanese</translation> </book> <book> <title>C++ How to Program</title> <translation edition = "1">Korean</translation> <translation edition = "2">French</translation> <translation edition = "2">Spanish</translation> </book></books>

Page 51: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

51

Location Paths Using Axes and Node Tests

• Which books have Japanese translations?– Use root node of XPath tree as context node– Use predicate

• Boolean expression for filtering nodes from search• Compare string value of current node to string

‘Japanese’/books/book/translation[. =

‘Japanese’]/../title

• Result <title>Java How to Program</title>

Page 52: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

More Examples• /books/book/translation[.='Japanese'] Result <translation edition="1">Japanese</translation>

<translation edition="2">Japanese</translation>

• /books/book/translation[.='Japanese']/.. Result

<book> <title>Java How to Program</title> <translation edition="1">Spanish</translation> <translation edition="1">Chinese</translation> <translation edition="1">Japanese</translation> <translation edition="2">French</translation> <translation edition="2">Japanese</translation></book>

Page 53: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Node-set Operators and Functions

• Node-set operators– Manipulate node sets to form others

• Node-set functions– Perform actions on node-sets returned by location

paths

Page 54: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Some node-set functions

Page 55: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Example

• /books/book/count(translation)• Result 5 3

• /books/book/translation/position()• Result 1 2 3 4 5 6 7 8

Page 56: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Node-set Operators and Functions • Location-path expressions

– Combine node-set operators and functions• Select all head and body children element nodeshead | body

• Select last bold element node in head element nodehead/title[ last() ]

• Select third book elementbook[ position() = 3 ]

– Or alternativelybook[ 3 ]

• Return total number of element-node childrencount( * )

• Select all book element nodes in document//book

Page 57: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

XPath 2.0• Latest version:

– http://www.w3.org/TR/xpath20/• W3C Working Draft 22 August 2003 • XPath 2.0 is a much more powerful language that operates

on a much larger domain of data types• A better way of describing XPath 2.0 is as an expression

language for processing sequences, with built-in support for querying XML documents

• XPath 2.0 is a strict syntactic subset of XQuery 1.0. Any expression that is syntactically valid and executes successfully in both XPath 2.0 and XQuery 1.0 will return the same result in both languages

Page 58: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

XPath 2.0• XPath 2.0 introduces support for the XML Schema primitive types,

which immediately gives the user access to 19 simple types, including dates, years, months, URIs, etc.

• In addition, a number of functions and operators are provided for processing and constructing these different data types

• Everything is a sequence and sequences are ordered• In XPath 1.0, if you wanted to process a collection of nodes, you had

to deal with node-sets.• In XPath 2.0, the concept of the node-set has been generalized and

extended.• Sequences may contain simple-typed values as well as nodes • “for” expression enables iteration over sequences

Page 59: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

XPath 2.0• For loop

– sum(for $x in /order/item return $x/price * $x/quantity)• Conditional expression:

– if ($widget1/unit-cost < $widget2/unit-cost) – then $widget1– else $widget2

• Quantifiers:– some $x in /students/student/name satisfies $x = "Fred“– every $x in /students/student/name satisfies $x = "Fred"

Page 60: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

XPath 2.0

• Intersections, differences, unions– The except operator to select all of a given node-

set, except for certain nodes • @* except @exc:foo

– the intersect operator • $x intersect /foo/bar

Page 61: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Xpath Conclusion

• XPath provides a concise and intuitive way to address into XML documents

• Standard part of the XSLT and XPointer specifications• Implementing XPath basically requires learning the

abbreviated syntax of location path expressions and the functions of the core library

Page 62: Unit 04 : W3C and Xpath COMP 5323 Web Database Technologies and Applications 2014

Reference

Online Example•http://www.w3schools.com/xpath/xpath_examples.asp

•www.w3.org

•Priscilla Walmsley, XQuery: Search Across a Variety of XML Data, O Reilly Media, 2007