university of moratuwa pie 202: internet technologies html, xhtml and xml dr. ajith pasqual master...
TRANSCRIPT
UNIVERSITYOF MORATUWA
PIE 202: Internet Technologies
HTML, XHTML and XML
Dr. Ajith Pasqual
Master of Business Administration/Postgraduate Diploma in Information Technology
Semester 2 module
UNIVERSITYOF MORATUWA
Some InformationContact Information:
Dr. Ajith Pasqual
Dept. of Electronic and Telecommunication Engineering,
University of Moratuwa,
Tel: 2650634 Ext. 3321
Email: [email protected]
Web Resources:
Web Page: http://www.ent.mrt.ac.lk/~pasqual/courses/PG/MBA/pie202
UNIVERSITYOF MORATUWA
Introduction • XML stands for the eXtensible Markup Language.• It was developed by the W3C (World Wide Web
Consortium), primarily to overcome limitations in HTML (http://www.w3.org)
• HTML has been the standard language used for web-based publishing.– dozen tags in V 1.0– about 100 tags in V 4.0
• HTML has some problems– It is a loosely type language– Strong syntax checking is not done.– Grouping of tags is arbitrary
• Result of the above is that some browsers will not properly display pages.
UNIVERSITYOF MORATUWA
Introduction (2)• Since HTML has grown from publishing scientific
documents to publishing almost anything, there is a growing demand for application specific tags on top of default tags.– electronic commerce applications would need tags for
product references, prices, names, addresses, and more.– Streaming would need tags to control the flow of images
and sound. – Search engines would need more precise tags for
keywords and descriptions.– Security would need tags for signing
• On the opposite side, some applications demand lesser tags– I-Mode phones (in Japan)– WAP phones– PDA browsers
UNIVERSITYOF MORATUWA
Introduction (3)
• XML has been developed to address the above problems.
• But it is unlikely that XML will replace HTML. (at least in the near future)
• However there is a convergence process where HTML is heading towards XML through XHTML (stricter syntax)
UNIVERSITYOF MORATUWA
Applications• Large Web site maintenance. XML would work behind the
scenes (more specifically on the server) to simplify the maintenance of HTML documents.
• Exchange of information between organizations.• Offloading and reloading of databases.• Syndicated content, where content is being made available
to different Web sites.• Electronic commerce applications where different
organizations collaborate to serve a customer.• Scientific applications with new markup languages for
mathematical and chemical formulas.• Electronic books with new markup languages to express
rights and ownership.• Handheld devices and smartphones.with new markup
languages optimized for these so-called "alternative" devices
UNIVERSITYOF MORATUWA
Applications (2)
• There are two classes of applications for XML: – publishing and – data exchange (also known as application
integration).
• Data exchange applications include most electronic commerce applications
UNIVERSITYOF MORATUWA
XHTMLWhat is XHTML?
• XHTML stands for EXtensible Hyper Text Markup Language
• XHTML is aimed to replace HTML
• XHTML is almost identical to HTML 4.01
• XHTML is a stricter and cleaner version of HTML
• XHTML is HTML defined as an XML application
XHTML 1.0 became an official W3C Recommendation January 26, 2000
XHTML is a combination of HTML and XML (eXtensible Markup Language).
XHTML consists of all the elements in HTML 4.01 combined with the syntax of XML
UNIVERSITYOF MORATUWA
XHTML …
Why XHTML?
• many pages on the WWW contain "bad" HTML
• Different Browser technologies
• XML is a markup language where everything has to be marked up correctly, which results in "well-formed" documents.
• XML was designed to describe data and HTML was designed to display data.
• By combining HTML and XML, and their strengths, create a markup language that is useful now and in the future -
XHTML
• XHTML pages can be read by all XML enabled devices
UNIVERSITYOF MORATUWA
XHTML ..Major Differences between HTML & XHTML:
• XHTML elements must be properly nested
• XHTML documents must be well-formed
• Tag names must be in lowercase
• All XHTML elements must be closed
Elements Must Be Properly Nested
In XHTML all elements must be properly nested within each other like this:
<b><i>This text is bold and italic</i></b>
UNIVERSITYOF MORATUWA
XHTMLDocuments Must Be Well-formed
All XHTML elements must be nested within the <html> root element. All other elements can have sub (children) elements. Sub elements must be in pairs and correctly nested within their parent element. The basic document structure is:
<html>
<head> ... </head>
<body> ... </body>
</html>
Tag Names Must Be in Lower Case
This is because XHTML documents are XML applications. XML is case-sensitive. Tags like <br> and <BR> are interpreted as different tags
UNIVERSITYOF MORATUWA
XHTML …All XHTML Elements Must Be Closed
Non-empty elements must have an end tag.
<p>This is a paragraph</p>
<p>This is another paragraph</p>
Empty Elements Must also Be Closed
Empty elements must either have an end tag or the start tag must end with />
This is a break<br />
Here comes a horizontal rule:<hr />
Here's an image <img src="happy.gif" alt="Happy face" />
For compatibility with present browsers: add an extra space before the "/" i.e. <br />
UNIVERSITYOF MORATUWA
XHTML …XHTML Syntax
•Attribute names must be in lower case
•Attribute values must be quoted
•Attribute minimization is forbidden
•The id attribute replaces the name attribute
•The XHTML DTD defines mandatory elements
Attribute Names must be in Lower Case
<table width="100%">
Attribute Values must be Quoted
<table width="100%"> NOT <table width=100%>
UNIVERSITYOF MORATUWA
XHTML …Attribute Minimization is Forbidden
Wrong:<dl compact> <input checked> <input readonly> <input disabled> <option selected> <frame noresize>
Correct:<dl compact="compact"> <input checked="checked"> <input readonly="readonly"> <input disabled="disabled"> <option selected="selected"> <frame noresize="noresize">
The id Attribute replaces the Name Attribute
HTML 4.01 defines a name attribute for the elements a, applet, frame, iframe, img, and map. In XHTML the name attribute is deprecated. Use id instead.<img src="picture.gif" id="picture1" /> NOT
<img src="picture.gif" name="picture1" />
UNIVERSITYOF MORATUWA
XHTML …Mandatory XHTML Elements
All XHTML documents must have a DOCTYPE declaration. The html, head and body elements must be present, and the title must be present inside the head element.
This is a minimum XHTML document template:
<!DOCTYPE Doctype goes here>
<html>
<head>
<title>Title goes here</title>
</head>
<body> Body text goes here </body>
</html>
Note: The DOCTYPE declaration is not a part of the XHTML document itself. It is not an XHTML element, and it should not have a closing tag.
UNIVERSITYOF MORATUWA
XHTML …The 3 Document Type Definitions
• DTD specifies the syntax of a web page in SGML.
• DTD is used by SGML applications, such as HTML, to specify rules that apply to the markup of documents of a particular type, including a set of element and entity declarations.
• XHTML is specified in an SGML document type definition or 'DTD'.
• An XHTML DTD describes in precise, computer-readable language the allowed syntax and grammar of XHTML markup
The XHTML standard defines three Document Type Definitions
• STRICT
• TRANSITIONAL (Most common)
• FRAMESET
UNIVERSITYOF MORATUWA
XHTML …The <!DOCTYPE> is Mandatory
An XHTML document consists of three main parts:
• the DOCTYPE
• the Head
• the Body
The basic document structure is:
<!DOCTYPE ...>
<html>
<head>
<title>... </title>
</head>
<body> ... </body>
</html>
The DOCTYPE declaration should always be the first line in an XHTML document
UNIVERSITYOF MORATUWA
XHTML …An XHTML Example
This is a simple (minimal) XHTML document:
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>simple document</title>
</head>
<body>
<p>a simple paragraph</p>
</body>
</html>
UNIVERSITYOF MORATUWA
XHTML …The DOCTYPE declaration defines the document type:
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
XHTML 1.0 Strict
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Use this when you want really clean markup, free of presentational clutter. Use this together with Cascading Style Sheets.
UNIVERSITYOF MORATUWA
XHTML ..XHTML 1.0 Transitional
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Use this when you need to take advantage of HTML's presentational features and when you want to support browsers that don't understand Cascading Style Sheets.
XHTML 1.0 Frameset
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
Use this when you want to use HTML Frames to partition the browser window into two or more frames.
UNIVERSITYOF MORATUWA
Core XML• XML aims at answering the conflicting demands
that arrived at the W3C for the future of HTML.• On one hand, some applications need more tags,
and these tags are increasingly specialized. For example, businessmen want tags for price and product reference. Mathematicians want tags for their formulas. Chemists also want tags for formulas, but they are not the same.
• On the other hand, other applications want a simple language
• The W3C essentially made two changes to HTML:– It predefines no tags.– It is stricter.
UNIVERSITYOF MORATUWA
Changes to HTML • No Predefined Tags
– Because there are no predefined tags in XML, you, the author, create the tags that you need.
<price currency=“Rs">499.50</price>
<toc xlink:href="/newsletter">ABC Co. </toc>
• The <price> tag has no equivalent in HTML
• <toc> tag can be simulated through a combination of table, hyperlink, and bold:
<table>
<tr> <td><!-- main text here --></td>
<td><a href="/newsletter"><b>ABC Co. </b></a></td> </tr>
</table>
UNIVERSITYOF MORATUWA
Changes to HTML (2)• The above code represents the extensible
aspect of XML (the X in XML). • XML is extensible because it predefines no tags
but lets the author create the tags needed for his or her application.
• But this opens many questions such as the following:– How does the browser know that <toc> is equivalent
to this combination of table, hyperlink, and bold?– Can you compare different prices?– What about the current and previous generations of
browsers?– How does this simplify Web site maintenance?
UNIVERSITYOF MORATUWA
Changes to HTML (3)
• Answers to the above problems:– The browsers or the Web servers use style
sheets– Prices can be compared (using API : DOM or
SAX)– XML can be made compatible with any
browser– XML enables you to concentrate on more
stable aspects of your document
UNIVERSITYOF MORATUWA
Changes to HTML (4)• Stricter Syntax
– HTML has a forgiving syntax– it was decided that XML would adopt a strict syntax. – A strict syntax results in smaller, faster, and lighter
browsers
• HTML– <p>Welcome to our site!<img src=logo.jpg>
• XML– <p>Welcome to our site!– <img src="logo.jpg"/></p>
• The image tag uses a special form for so-called empty elements).
UNIVERSITYOF MORATUWA
Document StructureINTERNAL MEMO
From: John Doe
To: Jack Smith
Regarding: XML at WhizBang
Have you heard of this new technology, XML? It looks promising. It is similar to HTML but it is extensible. All the big names (Microsoft, IBM, Oracle, Sun) are backing it.
We could use XML to launch new e-commerce services. It is also useful for the web site: you complained it was a lot of work, apparently XML can simplify the maintenance.
Check this web site <http://www.w3.org/XML> for more information. Also visit Que <http://www.quepublishing.com>. They have just released "XML by Example, 2nd Edition" by Benoît Marchal <http://www.marchal.com> with lots of useful information and some great examples. I have already ordered two copies!
John
UNIVERSITYOF MORATUWA
Document Structure (2)• The memo is made of at least three distinct
elements:– The title– The header, including sender and recipient names as
well as the subject– The body text
• These elements are organized in relation to each other, following a structure. For example, the title indicates that this is a memo. The title is followed by the header.
• Body text itself can be further broken down this way:– Three paragraphs– Several URLs– A signature
UNIVERSITYOF MORATUWA
Document Structure (3)
• This decomposition process can be continued and recognize smaller elements such as sentences, words, or even characters.
• However, these smaller elements usually add little information on the structure of the document.
• The above structure is independent from the appearance of the memo.
UNIVERSITYOF MORATUWA
Document Structure (4)2.
UNIVERSITYOF MORATUWA
Document Structure (5)
UNIVERSITYOF MORATUWA
Document Structure (6)• So what is the relationship between structure and
appearance ?• Ideally, a text is formatted to expose its structure to the
reader. • Remember TeX ?• The key to understanding XML, is that the structure of a
document is the foundation from which the appearance is deduced.
• Most file formats concentrate on the actual appearance of a document (they take great pain to ensure almost identical display on various platforms.)
• XML uses a different approach and records the structure of documents from which the formatting is automatically deduced
UNIVERSITYOF MORATUWA
Document Structure (7)% memo.tex \nopagenumbers \noindent John Doe\par \noindent Jack Smith\par \noindent XML at WhizBang\par \smallskip Have you heard of this new technology, XML? It looks promising. It is similar
to HTML but it is extensible. All the big names (Microsoft, IBM, Oracle, Sun) are backing it.\par
We could use XML to launch new e-commerce services. It is also useful for the web site: you complained it was a lot of work, apparently XML can simplify the maintenance.\par
Check this web site {\url http://www.w3.org/XML} for more information. Also visit Que {\url http://www.quepublishing.com} . They have just released "XML by Example, 2nd Edition" by Benoît Marchal {\url http://www.marchal.com} with lots of useful information and some great examples. I have already ordered two copies!\par
John\par \bye
UNIVERSITYOF MORATUWA
Document Structure (8)• Mark-up originates in the publishing industry. In
traditional publishing, the manuscript is annotated with layout instructions for the typesetter. These handwritten annotations are called mark-up.
• TeX represents what is known as generic coding of text documents.
• This has the following benefits:– It achieves higher portability and is more flexible. To
change the appearance of the document, it suffices to adapt the macro. By editing one macro, the change is automatically reported throughout the document. In particular, it does not require reencoding the markup, which is a time-consuming and error-prone activity.
– The markup is closer to describing the structure.
UNIVERSITYOF MORATUWA
Document Structure (9)
• HTML does not enforce a strict structure; in fact, HTML enforces very little structure.
• Although it is based on the structure-rich SGML, HTML has few options for organizing data.
• When the class attribute and style sheets were added to HTML it turned HTML into a generic coding language
UNIVERSITYOF MORATUWA
Document Structure - SGML<!DOCTYPE memo [ <!ELEMENT memo - - (header,body)><!ELEMENT header - O ((from & to) & subject?)> <!ELEMENT body - O (para*, signature)><!ELEMENT from - O (#PCDATA)><!ELEMENT to - O (#PCDATA)> <!ELEMENT subject - O (#PCDATA)><!ELEMENT para - O ((#PCDATA | link)*)> <!ELEMENT link - - (#PCDATA)> <!ATTLIST link url CDATA #REQUIRED><!ELEMENT signature ・ O (#PCDATA)>]><memo> <header> <from>John Doe <to>Jack Smith
UNIVERSITYOF MORATUWA
Document Structure – SGML(2)<subject>XML at WhizBang <body> <para>Have you heard of this new technology XML? It looks
promising. It is similar to HTML but it is extensible. All the big names (Microsoft, IBM, Oracle, Sun) are backing it. <para>We could use XML to launch new e-commerce services. It is also useful for the web site: you complained it was a lot of work, apparently XML can simplify the maintenance. <para>Check <link url="http://www.w3.org/XML">this web site</link> for more information. Also visit <link url="http://www.quepublishing.com">Que</link>. They have just released XML by Example, 2nd Edition" by <link url="http://www.marchal.com">Benoît Marchal</link> with lots of useful information and some great examples. I have already ordered two copies!
<signature>John </memo>
UNIVERSITYOF MORATUWA
Document Structure - HTML <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0
Transitional//EN"><html><head><title>WhizBang Memo: XML at
WhizBang</title></head><body><table bgcolor="lightgrey" border="1" width="70%"><tr><td><table><tr><td colspan="2"><font size="+2"
face="Garamond"><b>XML at WhizBang</b></font></td></tr>
<tr><td><font face="Garamond">From:</font></td><td><font face="Garamond">John Doe</font></td></tr>
<tr><td><font face="Garamond">To:</font></td><td><font face="Garamond">Jack Smith</font></td></tr>
</table></td></tr></table>
UNIVERSITYOF MORATUWA
Document Structure – HTML (2)<p><font face="Garamond">Have you heard of this new
technology, XML? It looks promising. It is similar to HTML but it is extensible. All the big names (Microsoft, IBM, Oracle, Sun) are backing it.</font></p> <p><font face="Garamond">We could use XML to launch new e-commerce services. It is also useful for the web site: you complained it was a lot of work, apparently XML can simplify the maintenance.</font></p> <p><font face="Garamond">Check <a href="http://www.w3.org/XML"> this web site</a> for more information. Also visit <a href="http://www.quepublishing.com">Que</a>. They have just released "XML by Example, 2nd Edition" by <a href="http://www.marchal.com">Benoît Marchal</a> with lots of useful information and some great examples. I have already ordered two copies!</font></p> <p><font face="Lucida Handwriting">
<i>John</i></font></p> </body></html>
UNIVERSITYOF MORATUWA
Document Structure – HTML with CSS<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0
Transitional//EN"> <html> <head><title>WhizBang Memo: XML at WhizBang</title> <style> .header { background-color: lightgrey; } .subject { font-family: Garamond; font-weight: bold; font-size: larger; } .to, .from { font-family: Garamond; } .para { font-family: Garamond; } .signature { font-family: "Lucida Handwriting"; font-style: italic; } </style> </head>
<body> <table class="header" border="1" width="70%"><tr><td> <table> <tr><td colspan="2" class="subject">XML at WhizBang</td></tr> <tr> <td class="from">From:</td> <td class="from">John Doe</td> </tr> <tr> <td class="to">To:</td>
<td class="to">Jack Smith</td> </tr> </table> </td></tr></table>
UNIVERSITYOF MORATUWA
Document Structure – HTML with CSS (2)<p class="para">Have you heard of this new technology,
XML? It looks promising. It is similar to HTML but it is extensible. All the big names (Microsoft, IBM, Oracle, Sun) are backing it.</p> <p class="para">We could use XML to launch new e-commerce services. It is also useful for the web site: you complained it was a lot of work, apparently XML can simplify the maintenance.</p> <p class="para">Check <a href="http://www.w3.org/XML"> this web site</a> for more information. Also visit <a href="http://www.quepublishing.com">Que</a>. They have just released "XML by Example, 2nd Edition" by <a href="http://www.marchal.com">Benoît Marchal</a> with lots of useful information and some great examples. I have already ordered two copies!</p> <p class="signature">John</p>
</body> </html>
UNIVERSITYOF MORATUWA
Document Structure – XML <?xml version="1.0"?> <memo> <header> <from>John Doe</from> <to>Jack Smith</to> <subject>XML at WhizBang</subject> </header> <body> <para>Have you heard of this new technology, XML? It looks
promising. It is similar to HTML but it is extensible. All the big names (Microsoft, IBM, Oracle, Sun) are backing it.</para> <para>We could use XML to launch new e-commerce services. It is also useful for the web site: you complained it was a lot of work, apparently XML can simplify the maintenance.</para> <para>Check <link url="http://www.w3.org/XML">this web site</link> for more information. Also visit <link url="http://www.quepublishing.com">Que</link>. They have just released XML by Example, 2nd Edition" by <link url="http://www.marchal.com">Benoît Marchal</link> with lots of useful information and some great examples. I have already ordered two copies!</para> <signature>John</signature>
</body> </memo>
UNIVERSITYOF MORATUWA
Applications of XML
• Main application areas:– Document applications manipulate information
primarily intended for human consumption.– Data applications manipulate information primarily
intended for software consumption
• Document Applications– The first application of XML would be document
publishing. The main advantage of XML in this arena is that XML concentrates on the structure of the document, and this makes it independent of the delivery medium
UNIVERSITYOF MORATUWA
Applications – XML (2)
UNIVERSITYOF MORATUWA
Applications – XML (3)Data Applications
One of the original goals of SGML was to give document management access to the software similar to that used to manage other datasets, such as databases.
UNIVERSITYOF MORATUWA
Applications – XML (4)The structure of a database in XML
UNIVERSITYOF MORATUWA
Database ApplicationIdentifier Name Price
P1 XML Editor $499.00
P2 DTD Editor $199.00
P3 XML Book $29.99
P4 XML Training $699.00
UNIVERSITYOF MORATUWA
Database Application<?xml version="1.0"?>
<products>
<product id="p1">
<name>XML Editor</name> <price>499.00</price>
</product>
<product id="p2">
<name>DTD Editor</name> <price>199.00</price>
</product>
<product id="p3">
<name>XML Book</name> <price>29.99</price>
</product>
<product id="p4">
<name>XML Training</name> <price>699.00</price>
</product>
</products>
UNIVERSITYOF MORATUWA
XML Namespace• Namespace places elements within a global naming
system.• The concept of namespace is similar to the scope of
variables in programming languages. If you declare an i variable in a function computeAverage(), the scope of i is the computeAverage() function.
• If another function, say computeMax() also declares an i variable, there is no conflict. For the compiler, the two variables are different because they are defined in different functions. They have different scopes
• Namespace is somewhat similar. Namespace makes it possible to define elements specific to a given application of XML. If another application defines elements with the same name but in a different namespace, there is no conflict.
UNIVERSITYOF MORATUWA
XML Namespace<?xml version="1.0"?> <xbe:list xmlns:html="http://www.w3.org/1999/xhtml"
xmlns:xbe="http://www.psol.com/xbe2/listing1.9"> <xbe:table> <xbe:name>persons</xbe:name> <xbe:column>first-name</xbe:column> <xbe:column>last-name</xbe:column> </xbe:table> <html:table> <html:tr><html:td>Sean</html:td><html:td>Dixon</html:td></html:tr> <html:tr><html:td>Todd</html:td><html:td>Green</html:td></html:tr> <html:tr> <html:td>Benoit</html:td><html:td>Marchal</html:td> </html:tr> </html:table> </xbe:list>
UNIVERSITYOF MORATUWA
XML Stylesheets
• XML is supported by two style sheet languages: XSL (XML Stylesheet Language) and CSS (Cascading Style Sheets).
• They specify how XML documents should be rendered onscreen, on paper, or in an editor.
• XSL is more powerful, but CSS is widely implemented
UNIVERSITYOF MORATUWA
XML APIs : DOM & SAX
• DOM (Document Object Model) and SAX (Simple API for XML) are APIs to access XML documents.
• They allow applications to read XML documents without having to worry about the syntax.
• They are complementary: DOM is best suited for browsers and editors; SAX is best for all the rest.
UNIVERSITYOF MORATUWA
XLink and XPointer• XLink and XPointer are two parts of one
standard currently under development to provide a mechanism to establish relationships and hyperlinks between documents.
<?xml version="1.0"?>
<resources xmlns:xlink="http://www.w3.org/1999/xlink">
<entry xlink:type="simple" xlink:show="replace" xlink:href="http://www.mcp.com">Que</entry>
<entry xlink:type="simple" xlink:show="replace" xlink:href="http://www.marchal.com">marchal.com</entry>
<entry xlink:type="simple" xlink:show="replace" xlink:href="http://www.informit.com">InformIT</entry>
<entry xlink:type="simple" xlink:show="replace" xlink:href="http://www.pineapplesoft.com/newsletter"> Pineapplesoft Link</entry>
</resources>
UNIVERSITYOF MORATUWA
XML Software• XML Browser:
– An XML browser is used to view and print XML documents
• XML Editors– Programmer's editors, such as XML Spy (http://
www.xmlspy.com/) or XML Pro (http://www.vervet.com/), let you manipulate the XML code directly. They are powerful, but you have to know XML to use them
– WYSIWYG editors, such as XMetaL (http://www.xmetal.com/), simulate word processors. Tools in this category are ideal for end users who may not be familiar with the XML (and may not want to be).
UNIVERSITYOF MORATUWA
XML Software (2)
• XML Editors ..– The tabular view of XML spy makes the
structure of the document apparent. It shows clearly how elements nest.
– In contrast, XMetaL, hides the XML code entirely. XMetaL is ideal for markup-challenged users when you want to concentrate on writing and not on the markup
UNIVERSITYOF MORATUWA
XML Spy
UNIVERSITYOF MORATUWA
XMetal
UNIVERSITYOF MORATUWA
XML Software (3)• XML Parsers
– XML Parser allows to scan through a XML document to identify its structure and then do some processing based on that.
– One of the most popular parsers is Apache's Xerces for Java, C++, and Perl (xml.apache.org).
• XSL Processor– Publishing directly using XML can be a problem for
users who view the contents as not many browsers support XML fully.
– With XSL, it is possible to create classic HTML that works with current and former-generation browsers (and older, too) from XML documents.
– Several XSL processors are available, and one of the most popular is Apache's Xalan (xml.apache.org)
UNIVERSITYOF MORATUWA
XML Syntax• XML is a set of standards to exchange and publish
information in a structured manner.• XML is a language used to describe and manipulate
documents that follow a structure. XML documents are not limited to books, articles, or Web sites. They could be used with objects from a client/server application.
• XML defines a syntax or a file format that is useful for books, articles, client/server applications and more.
• This is possible because the XML format does not dictate or enforce a particular structure. It limits itself to rules that you can use to write a tree data structure on disk.
UNIVERSITYOF MORATUWA
XML Syntax (2)• An XML document is a text. XML-wise, the
document consists of character data and markup. Both are represented as text in the document.
John Doe
34 Fountain Square Plaza
Cincinnati, OH 45202
US
513-744-8889 (preferred)
513-744-7098
Jack Smith
513-744-3465
Never leave messages on his answering machine. Email instead.
UNIVERSITYOF MORATUWA
XML Syntax (3)<?xml version="1.0"?><!-- address book in XML --><address-book> <entry> <name>John Doe</name> <address> <street>34 Fountain Square Plaza</street> <region>OH</region> <postal-code>45202</postal-code> <locality>Cincinnati</locality> <country>US</country> </address> <tel preferred="true">513-744-8889</tel> <tel>513-744-7098</tel> <email href="mailto:[email protected]"/> </entry> <entry> <name>Jack Smith</name> <tel>513-744-3465</tel> <email href="mailto:[email protected]"/> <comments>Never leave messages on his answering machine. <b>Email instead.</b></comments> </entry></address-book>
UNIVERSITYOF MORATUWA
XML Syntax
XML document describing a person
<person>
<name>
<first_name>Alan</first_name> <last_name>Turing</last_name>
</name>
<profession>computer scientist</profession> <profession>mathematician</profession> <profession>cryptographer</profession>
</person>
UNIVERSITYOF MORATUWA
XML Syntax
• The person element is called the parent of the name element and the three profession elements. The name element is the parent of the first_name and last_name elements. The name element and the three profession elements are sometimes called each other's siblings. The first_name and last_name elements are also siblings.
• XML gives each child exactly one parent, not two or more. Each element (except the root element) has exactly one parent element
UNIVERSITYOF MORATUWA
XML Syntax
UNIVERSITYOF MORATUWA
XML Syntax (4)• The plain text format and XML format carry
exactly the same information. Yet, because plain text has no markup, there is no structure information.
• Element's Start and End Tags– The building block of XML is the element. Each
element has a name and a content.– <tel>513-744-7098</tel> – The content of an element is delimited by special
markups known as a start tag and end tag. The tagging mechanism is similar to HTML, which is logical because both HTML and XML inherited their tagging mechanism from SGML
UNIVERSITYOF MORATUWA
XML Syntax (5)• XML limits itself to defining what an element is and
how to mark up an element with tags. • It provides a syntax to store information according to a
structure but, unlike HTML, it does not define what the structure is.
• Names in XML– Element names must follow certain rules. Specifically
they must start with either a letter or the underscore character ("_"). The rest of the name consists of letters, digits, the underscore character, the dot (".") or a hyphen ("-"). Spaces are not allowed in names.
– Names cannot start with the string "xml", which is reserved for the XML specification itself.
– Colon (:) is reserved for namespaces.
UNIVERSITYOF MORATUWA
XML Syntax (6)
• Valid XML names:– <copyright-information> <p> <base64>
<décompte.client> <firstname>
• The following are examples of invalid element names. You could not use these names in XML:– <123> <first name> <tom&jerry>
• Unlike HTML, names are case sensitive in XML. So, the following names are all different:– <address> <ADDRESS> <Address>
UNIVERSITYOF MORATUWA
XML Syntax (7)Attributes• It is possible to attach additional information to
elements in the form of attributes. Attributes have a name and a value. The names follow the same rules as element names.
• Again, the syntax is similar to HTML. Elements can have zero, one, or more attributes in the start tag. The name of the attribute is separated from the value by the equal character. The value of the attribute is enclosed in double or single quotation marks.
• For example, the tel element can have a preferred attribute (for example, to indicate which phone number you should try first):
UNIVERSITYOF MORATUWA
XML Syntax (8)• <tel preferred="true">513-744-8889</tel> Unlike
HTML, XML insists on the quotation marks. • An XML parser would reject the following:• <tel preferred=true>513-744-8889</tel> • Quotation marks can be either single or double
quotes. This is convenient if you need to insert single or double quotes in an attribute value.
• <confidentiality level="I don't know"> This document is not confidential. </confidentiality> or
• <confidentiality level='approved "for your eyes only"'> This document is top-secret </confidentiality>
UNIVERSITYOF MORATUWA
XML Syntax (8)Empty Element• Elements that have no content are known as empty elements.
Usually (although it is not required), they have attributes.• There is a shorthand notation for empty elements: The start
and end tags merge and the slash from the end tag is added at the end of the opening tag.
• For XML, the following two empty elements are identical:• <email href="mailto:[email protected]"/>
<email href="mailto:[email protected]"></email>
UNIVERSITYOF MORATUWA
XML Syntax (9)
Nesting of Elements• Elements can contain text (name), other elements
(entry), or a combination of text and elements (comments).
• The underlying data structure for XML document is the tree of elements.
• The depth of the tree has no limit, and elements can repeat.
• An element that is enclosed in another element is called a child. The element it is enclosed into is its parent.
• Each child has only one parent.
UNIVERSITYOF MORATUWA
XML Syntax (10)<entry>
<name>Jack Smith</name> <tel>513-744-3465</tel> <email href="mailto:[email protected]"/> <comments>Never
leave messages on his answering machine. <b>Email instead.</b></comments>
</entry>
The entry element above has four children: name, tel, email, and comments
Root
At the root of the document there must be one and only one element. In other words, all the elements in the document must be the children of a single element.
UNIVERSITYOF MORATUWA
XML Syntax
UNIVERSITYOF MORATUWA
XML Syntax (11)<?xml version="1.0"?> <entry> <name>John Doe</name> <email href="mailto:[email protected]"/> </entry> <entry> <name>JackSmith</name> <email href="mailto:[email protected]"/>
</entry>
It is easy to fix the above example by introducing a new root element, such as address-book:
UNIVERSITYOF MORATUWA
XML Syntax (12)
• XML Declaration– The XML declaration is the first line of the document. The
declaration identifies the document as an XML document. The declaration also lists the version of XML used in the document. For the time being, it's 1.0.
– <?xml version="1.0"?> – An XML parser can reject documents with another version
number– The declaration can contain other attributes to support special
features such as character-set encoding. • The XML declaration is optional. When a second version of XML
comes, XML declaration would most probably become mandatory.• If the declaration is included, however, it must start on the first
character of the first line of the document. The XML recommendation suggests you include the declaration in every XML document.
UNIVERSITYOF MORATUWA
XML Syntax (13)
• The two major differences between HTML and XML are– XML does not define elements but it provides a
mechanism to create your own. With HTML, the W3C had defined elements for paragraphs (<p>), bold (<b>), section titles (<h1>-<h6>) and more. In XML, it's up to you, the author of the document, to create meaningful elements.
– XML is very strict. For example, every element must have a start and end tag (unless they are empty elements, but then they must follow a special rule).
UNIVERSITYOF MORATUWA
XML Syntax (14)• Comments
– To insert comments in a document, enclose them between "<!--" and "-->". Comments are intended for the human reader and the XML parser ignores them.
• Unicode– Characters in XML documents follow the Unicode
standard. Unicode is a major extension to the familiar ASCII character set. It is published by the Unicode Consortium (http://www.unicode.org/). The same standard is published by the ISO as ISO/IEC 10646.
– Unicode supports all spoken languages (on Earth) as well as mathematical and other symbols. It supports English, Western European languages, Cyrillic, Japanese, Chinese, and so on.
– Unicode, to accommodate all those characters, needs 16 bits per character. Unicode characters are twice as large as their Latin-1 counterparts; that's the price to pay for international support
UNIVERSITYOF MORATUWA
XML Syntax (18)• A document written in Latin-1 needs the following
XML declaration:
<?xml version="1.0" encoding="ISO-8859-1"?>
<entrée>
<nom>José Dupont</nom>
<email href="mailto:[email protected]"/>
</entrée>
UNIVERSITYOF MORATUWA
XML Syntax (19)• Entities– A simple document is complete and can be
stored in just one file. Complex documents are often split among several files: the text, the accompanying graphics, and so on.
– XML, however, does not reason in terms of files. Instead, it organizes documents physically in entities. In some cases, entities are equivalent to files; in others they are not.
– Entities are inserted in the document through entity references. An entity reference is the name of the entity between an ampersand character and a semicolon.
– The XML parser replaces the entity reference with its value. If we assume we have defined an entity "us" with the value "United States" , the following two lines are strictly equivalent:
• <country>&us;</country> • <country>United States</country>
UNIVERSITYOF MORATUWA
XML Syntax (20)
• XML predefines entities for its delimiters (angle brackets, quotes, and so on). These entities are used to escape the delimiters in elements or attributes content. The predefined entities are– < left-angle bracket "<" must be escaped with <– & ampersand "&" must be escaped with &– > right-angle bracket ">" must be escaped with >
in the combination ]]> in CDATA sections (see the following CDATA section)
– ' single quote "'" can be escaped with ' essentially in attribute value
– " double quote """ can be escaped with " essentially in attribute value
UNIVERSITYOF MORATUWA
XML Syntax (21)
• The following is not valid because the ampersand would confuse the XML processor:– <company>Marks & Spencer</company>
Instead, it must be rewritten to escape the ampersand bracket with an & entity:
• <company>Marks & Spencer</company>
UNIVERSITYOF MORATUWA
XML Syntax (22)
• Special Attributes• XML defines two attributes
– xml:space: Like Web browsers, most XML applications discard duplicated spaces. Yet, sometimes spaces are meaningful. HTML has a special element (<PRE>) to preserve spaces. This attribute tells the application what to do with spaces. If set to preserve, the application should preserve all spaces. If set to default, the application can ignore duplicate spaces.
• The following example asks the application to preserve spaces in a listing element:
UNIVERSITYOF MORATUWA
XML Syntax (23)• <listing xml:space="preserve">for(String line =
reader.readLine();
null != line;
line = reader.readLine()) writer.println(line); </listing>
• xml:lang:.... It is often desirable to know in which language the content is written. This attribute records the language. For example
<p xml:lang="en-GB">What colour is it?</p> <p xml:lang="en-US">What color is it?</p>
UNIVERSITYOF MORATUWA
XML Syntax (24)• Processing Instructions• Processing instructions (abbreviated PI) is a
mechanism to insert non-XML statements, such as scripts, in the document.
• At first sight, the existence of processing instructions is at odds with the XML concept that structure comes first. As we saw in the first chapter, XML processing is derived from the structure of the document, not from instructions inserted in the document.
• That's the theory, at least. In practice, there are cases where it is simpler to insert instructions rather than define complex structures. Processing instructions are a concession to reality by the standard developers.
• The xml declaration is a processing instruction• <?xml version="1.0" encoding="ISO-8859-1"?
> •
UNIVERSITYOF MORATUWA
XML Syntax - CDATA
• Markup delimiters (left-angle bracket and ampersand) that appear in the content of an element must be escaped with an entity.
• For some applications, it is difficult to escape markup characters, if only because there are too many of them.
• Mathematical equations can use many left-angle brackets. It is difficult to include a scripting language in a document and to escape the angle brackets and ampersands.
• Also, it is difficult to include an XML document in an XML document.
UNIVERSITYOF MORATUWA
XML Syntax
• CDATA (Character Data) sections were introduced for those cases.
• CDATA sections are delimited by "<![CDATA[" and "]]>".
• The XML parser ignores delimiters within the CDATA section, except for ]]> (which means it is not possible to include a CDATA section in another CDATA section).
UNIVERSITYOF MORATUWA
XML Syntax
Example of CDATA:
<?xml version="1.0"?>
<example>
<![CDATA[
<?xml version="1.0"?>
<entry>
<name>John Doe</name>
<email href="mailto:[email protected]"/>
</entry>]]>
</example>
UNIVERSITYOF MORATUWA
XML and Semantic
• XML alone does not define the meaning (the semantic) of the document. The element names are meaningful only to humans. They are meaningless for the XML parser.
• The parser does not know (in case of the address book example) what a name is. And it does not know the difference between a name and an address, apart from the fact that an address element has more children than a name element
• The semantic of an XML document is provided by the application
UNIVERSITYOF MORATUWA
Common Errors in XML• Forgetting End Tags
– end tags are mandatory (except for empty elements). The XML processor would reject the following because street and country have no end tags:
– <address> <street>34 Fountain Square Plaza <region>OH</region> <postal-code>45202</postal-code> <locality>Cincinnati</locality> <country>US </address>
• Forgetting That XML Is Case Sensitive– XML names are case sensitive. The following two
elements are different for XML. The first one is a tel element whereas the second one is a TEL element
– <tel>513-744-7098</tel> – <TEL>513-744-7098</TEL>
UNIVERSITYOF MORATUWA
Common Errors (2)• Introducing Spaces in the Name of the
Element– It is incorrect to introduce spaces in the name of
elements. The XML parser interprets spaces as the beginning of an attribute.
– The following example is not valid because address book has a space in it:
– <address book> <entry> <name>John Doe</name> <email href="mailto:[email protected]"/> </entry> </address book>
• Forgetting the Quotes for the Attribute Value– Unlike HTML, XML forces you to quote attributes.
The following is not acceptable:– <tel preferred=true>513-744-8889</tel>
UNIVERSITYOF MORATUWA
UNIVERSITYOF MORATUWA
UNIVERSITYOF MORATUWA
UNIVERSITYOF MORATUWA
Publishing• XML roots are in publishing, it's no wonder the
standard is well adapted to publishing. • The XML standard itself was published with XML.• The main advantages of using XML for publishing
are– The capability to convert XML documents to different
media: the Web, print, and more– For large document sets, the ability to enforce a
common structure that simplifies editing– The emphasis on structure means that XML documents
are better equipped to withstand the test of time, because structure is more stable than formatting (as anybody who publishes a Web site knows, fashion changes every year but the content need not be rewritten that often)
UNIVERSITYOF MORATUWA
E-commerce<?xml version="1.0"?>
<Order confirm="true">
<Date>2000-03-10</Date>
<Reference>AGL153</Reference>
<DeliverBy>2000-04-10</DeliverBy>
<Buyer>
<Name>Playfield Books</Name>
<Address>
<Street>34 Fountain Square Plaza</Street>
<Locality>Cincinnati</Locality>
<PostalCode>45202</PostalCode>
<Region>OH</Region>
<Country>US</Country>
</Address>
</Buyer>
UNIVERSITYOF MORATUWA
Ecommerce (2)<Seller> <Name>Macmillan Publishing</Name> <Address> <Street>201 West 103RD Street</Street> <Locality>Indianapolis</Locality> <PostalCode>46290</PostalCode> <Region>IN</Region> <Country>US</Country> </Address> </Seller> <Lines> <Product> <Code type="ISBN">0789725045</Code> <Description>XML by Example</Description> <Quantity>15</Quantity> <Price>29.99</Price> </Product> <Product> <Code type="ISBN">0672320541</Code> <Description>Applied XML Solutions</Description> <Quantity>5</Quantity> <Price>44.99</Price> </Product> </Lines></Order>
UNIVERSITYOF MORATUWA
E-commerce (3)• If the electronic documents are written in XML, the markup
matches the structure of the document. E-commerce applications can scan the above invoice and recognize the product codes and the quantity ordered.
• This was the realm of EDI technologies (EDI stands for Electronic Data Interchange). The core of EDI is a major effort to standardize every commercial and administrative document (order, invoice, tax declaration, payment, catalog, and more).
• EDI, however, has traditionally focused on reducing costs. The idea was to replace the most human-intensive operations with computer systems.
• With XML and the Internet, the focus is not merely on reducing costs but increasingly on opening new markets
UNIVERSITYOF MORATUWA
Namespaces in XML• XML is extensible. So it says in the name:
eXtensible Markup Language. • The problem is that extensibility does not come
free. Misused, it could be a source of problems.• In a networked environment, such as the Web,
extensibility must be managed to avoid conflicts.
• Namespaces is a solution to help manage XML extensibility.
• XML namespace is a mechanism to identify XML elements. It places the name of the elements in a more global context.
UNIVERSITYOF MORATUWA
Namespaces (2)• Look at the example in resource.xml• In practice, however, documents are seldom standalone.
In a collaborative environment such as the Web, people build on one another's work. Somebody might take your list and rate it – look at example ratings.xml
• This is the same document with one new element: rating. It is often desirable to extend documents to convey new information instead of designing new ones from scratch.
• Problems occur, however, if extensions are not properly managed. Suppose somebody else decides to rate the list, but instead of quality, it rates against family criteria
UNIVERSITYOF MORATUWA
Namespaces (3)• Look at pgratings.xml• This is problematic. pgratings also is an extension to
resource but it creates incompatibilities between ratings and pgratings, because both introduce a rating element.
• This is a very common problem: Two groups extend the same document in incompatible ways.
• Things get really out of hand when trying to combine both ratings in a listing.
• When building a portal, you want to present the visitor with both quality rating and parental guidance.
• The result would look like combinedratings, in which the conflict between the two rating elements is obvious.
UNIVERSITYOF MORATUWA
Namespaces (4)
• The solution to above conflict is obvious: Use different element names for each concept.
• In combinedratings, we have two concepts: quality rating and parental guidance. They should have different tags.
• prefixratings renames the "quality" element as qa-rating and the "parental" element as pa-rating
UNIVERSITYOF MORATUWA
Namespaces (5)
• Can the above be a perfect solution ?• No!! Coming up with prefixes is possible
only if we are aware of the conflict in advance
• Look at nsratings.xml - it uses namespaces to prevent naming clashes
• The major difference is the form of the names. In nsratings, a colon separates the name from its prefix:
• <qa:rating>5 stars</qa:rating>
UNIVERSITYOF MORATUWA
Namespaces (6)• The prefix unambiguously identifies the type of rating within
this document. • However, prefixes alone do not solve problems because
anybody can create prefixes. • Therefore, different people can create incompatible prefixes
and you are back to step one except that you have moved the risk of conflicts from element names to prefixes.
• To avoid conflicts in prefixes, prefixes are declared:• <bookmarks
xmlns:pg="http://www.playfield.com/parental/en/1.0" xmlns:qa="http://www.writeit.com/quality" xmlns="http://www.pineapplesoft.com/2001/bookmark">
• The declaration associates a URI (Uniform Resource Identifier) with a prefix. This is the crux of the namespaces proposal because URIs, unlike element names or prefixes, can be made unique.
UNIVERSITYOF MORATUWA
Namespaces (7)• A namespace declaration is introduced in an attribute,
starting with xmlns followed by the prefix (note that, for the declaration, the prefix comes at the end of the attribute; when used, the prefix comes first). In prefixratings, two prefixes are declared: qa and pa.
• The attribute xmlns, without a following prefix, declares the default namespace, that is, the namespace for those elements that have no attributes. In nsratings, a default namespace is also declared.
• A namespace is valid for the element on which it is declared and its content (including elements contained within the element), unless overridden by another namespace declaration with the same prefix.
UNIVERSITYOF MORATUWA
Namespaces (8)
• In summary, XML namespaces is a mechanism to unambiguously identify who has developed which element. It's not much, but it is an essential service.
• The Namespace Name– The namespace name is the URI, not the prefix.
In other words, when comparing two elements, the parser uses the URIs, not the prefixes to recognize their namespaces.
UNIVERSITYOF MORATUWA
Namespaces (9)• The namespace declaration associates a global
name (the URI) with the name of the element• First and foremost, the URI is only used as an
identifier. As far as XML namespaces are concerned, it need not be valid!!
• Why ? You must be able to process XML documents without a connection to the Internet.
• Example: In electronic commerce, some XML applications run on secured computers that are not connected to the Internet. It would be difficult to process XML namespaces if they had to resolve URIs.
UNIVERSITYOF MORATUWA
Namespaces (10)• Solution: Use URIs to guarantee uniqueness
through domain names, but place no restrictions on the URIs.
• In particular, the URIs do not need be valid. Yhey do not need to point to a resource.
• Because URIs need not be valid, XML namespaces treats them as a string. In particular, comparisons are done character-by-character. According to this definition, the following two URIs are not identical, even though they point to the same document:– http://www.marchal.com – http://marchal.com .
UNIVERSITYOF MORATUWA
Namespaces (11)• Scoping
– The namespace is valid for the element where it is declared and all the elements within its content, as illustrated in scoping.xml. In programming circles, this is referred to as scoping
– There are three namespaces declared in scopings.xml. bk is declared on the top-level element and is therefore valid for all the elements. ns is declared twice for the two rating elements, but with different URIs (corresponding to different namespaces).
• the attributes are not associated with any namespace but, as sponsored.xml illustrates, they could be.
UNIVERSITYOF MORATUWA
Namespace (12)• Digital Signature : An example of Namespaces. (look at
signed.xml)• Signature and data are identified by their namespace
UNIVERSITYOF MORATUWA
XML Models
• XML models refer to mechanisms that describe the structure of a document.
• The two mechanisms are– the DTD, short for Document Type Definition and – XML Schema.
• DTDs and XML Schemas ultimately serve the same objective. Both describe the structure of XML documents.
• Both are used to validate documents against their models
UNIVERSITYOF MORATUWA
DTD• The DTD dates back to SGML.• It is a proven solution and it is easy to use. • DTD’s were found lacking on three issues:
– DTDs are based on 20-year-old modeling concepts. They have no support for modern design, such as object-oriented modeling.
– DTDs were designed for publishing. They are ill-suited to more recent applications of XML, in particular, data exchange and application integration.
– DTDs have their own syntax, which is incompatible with XML documents. Therefore, it is not possible to use XML tools.
• W3C has launched an effort to develop a replacement called XML Schema. Schemas support more modern modeling concepts, are better suited for data exchange and application integration, and, last but not least, are written as XML documents.
UNIVERSITYOF MORATUWA
DTD (2) - Syntax
• the syntax for DTDs is different from the syntax of XML documents. Abook-dtd.xml is the address book introduced earlier but with one difference: It has a new <!DOCTYPE> statement that links the document file to its DTD.
• The <!DOCTYPE> statement is known as the Document Type Declaration (not to be confused with the DTD).
UNIVERSITYOF MORATUWA
DTD (3) – Syntax ..• The <!DOCTYPE> contains the root of the
document (address-book) and the filename (or a URI) for the DTD itself (SYSTEM "abook-dtd.dtd").
• As abook-dtd.xml illustrates, if present, the document type declaration appears immediately after the XML declaration
• <!DOCTYPE address-book SYSTEM "abook-dtd.dtd">
• the DTD declares a list of elements but does not specify which one is the root. It's up to the document to select a root.
UNIVERSITYOF MORATUWA
DTD (4) - Syntax• Element Declaration
– The DTD uses a special syntax to declare every object (elements, attributes, and so on) that can appear in XML documents. Let's start with element declarations.
– Element declarations take the form of an <!ELEMENT statement and contain the element name (entry) and its content model ((name,address*,tel*,fax*,email*,comments?)). The content model simply lists the possible children of the element:
– <!ELEMENT entry (name,address*,tel*,fax*,email*,comments?)>
• The plus ("+"), star ("*"), and question mark ("?")in the content model are known as occurrence indicators. They indicate whether and how elements repeat
UNIVERSITYOF MORATUWA
DTD (5) – Syntax ..• An element followed by no occurrence indicator
must appear once and only once.• An element followed by a "+" character must
appear one or several times. In other words, it can repeat.
• An element followed by a "*" character can appear zero or more times. The element is optional but, if it is included, it can repeat.
• An element followed by a "?" character can appear once or not at all. It indicates that the element is optional and, if included, cannot repeat.
UNIVERSITYOF MORATUWA
DTD (6) - Syntax
• The content model for entry uses occurrence indicators.
• They enforce the repetitiveness of children: Except for name, the children are optional, and all but name and comments can appear several times in the document:
• <!ELEMENT entry (name,address*,tel*,fax*,email*,comments?)>
UNIVERSITYOF MORATUWA
DTD (7) – Syntax ..
• The comma (",") and vertical bar ("|") characters are connectors. They indicate the order in which the children can appear:– The "," character indicates that both elements
(on the right and the left of the comma) must appear in the same order in the document.
– The "|" character indicates that only one of the two elements on the left or right of the vertical bar can appear in the document.
• parentheses can be used to group elements on the left and right of connectors.
UNIVERSITYOF MORATUWA
DTD (8) – Syntax …
• If we were to change the declaration of entry into
• <!ELEMENT entry (name,(address* | tel* | fax* | email*),comments?)>
• only one of address, tel, fax or email could appear after the name. So, an entry could have several addresses or several phone numbers but not both.
UNIVERSITYOF MORATUWA
DTD (9) – Syntax ..• Keywords
– In addition to elements, the following keywords can appear in content models:
– #PCDATA means that the element can contain text. #PCDATA stands for parsed character data.
– EMPTY means that the element is an empty element.– ANY means that the element can contain any element provided that
it was declared elsewhere in the DTD. ANY is used mostly during the development of a DTD, until a more precise content has been developed
• In abook-dtd.dtd, tel is declared as text, whereas email is an empty element:– <!ELEMENT tel (#PCDATA)> – <!ELEMENT email EMPTY>
• CDATA sections can appear within #PCDATA as well. They need not be declared explicitly
UNIVERSITYOF MORATUWA
DTD (10) – Syntax …
• Mixed Content– Element contents that include both elements
and #PCDATA are said to be mixed content. Those that contain only elements are said to be element content. In abook-dtd.dtd, comments has mixed content:
• <!ELEMENT comments (#PCDATA | b)*>
• The elements and #PCDATA in mixed content must always be separated by a "|" and the whole model must always repeat.
UNIVERSITYOF MORATUWA
DTD (11) – Syntax …• Nonambiguous Model
– There's one additional rule: The content model must be deterministic or unambiguous.
– In plain English, it must be possible to validate a document by reading it one element at a time.
• <!ELEMENT cover ((title, author) | (title, subtitle))> – <cover><title>XML by Example</title>
<author>Benoît Marchal</author></cover> – it is not possible to decide whether the title element is part of (title,
author) or of (title, subtitle) by looking at title only (one element at a time).
• It is often possible to remove the ambiguity, as in• <!ELEMENT cover (title, (author | subtitle))>
UNIVERSITYOF MORATUWA
DTD (12) – Syntax …• Attributes
– Attributes too must be declared in the DTD– <!ATTLIST email href CDATA #REQUIRED
preferred (true | false) "false"> – The declaration starts with the element name (email)
followed by one or more attribute declarations. In this example, two attributes have been declared (href and preferred). The declaration includes their type (CDATA or (true | false)) and a default value (#REQUIRED or "false").
– Attribute declaration can appear anywhere in the DTD. For readability, it is best to list attributes immediately after their corresponding element.
UNIVERSITYOF MORATUWA
DTD (13) – Syntax …• The DTD provides more control over attributes than over elements.
They are broadly divided into three categories:– String attributes contain text, for example:<!ATTLIST email href CDATA #REQUIRED>– Tokenized attributes limit the content of the attribute, for
example:<!ATTLIST entry id ID #IMPLIED> – Enumerated type attributes lists acceptable value, for example:<!ATTLIST entry preferred (true | false) "false">
• The DTD predates XML namespaces, and, therefore, it does not recognize them. If your document uses namespaces, you need to declare the xmlns attributes and the element prefixes explicitly, as in– <!ELEMENT xbe2:name (#PCDATA)> <!ATTLIST xbe2:name
xmlns:xbe2 CDATA #FIXED "http://www.psol.com/xbe2/listing4.2">
UNIVERSITYOF MORATUWA
DTD (14)• Relationship Between the DTD and the
Document– the DTD specifies which elements are allowed where in
the document.– the document in abook-dtd.xml is valid because it
respects its DTD. Practically, it means that, among other things, the entry elements are enclosed in an address-book; that they each contain a name; and that the address, tel, and email appear in the order specified in the DTD. Only the second entry has a comment element, but that is not a problem because comment is optional.
• Validating the Document– To validate XML documents, you need a validating
parser
UNIVERSITYOF MORATUWA
XML Schema
• Schemas improve DTDs by supporting more data types and XML namespaces and adopting the familiar syntax of XML documents for the model itself.
• The concept, however, remains the same: A schema describes XML documents so that parsers can validate them.
• One of the most visible differences between DTDs and XML Schemas is that schemas are regular XML documents. Unlike DTDs, they don't rely on a special syntax
UNIVERSITYOF MORATUWA
XML Schema (2)
• Simple Type Definitions– Schemas support simple and complex types. – Simple types are
• atomic (string, integer, boolean, and more),• whereas complex types aggregate simple types.
• Simple type definitions (written as simpleType elements) restrict or augment the built-in simple types. As the name implies, the restriction element limits the values of a simple type. The original type is referenced in the base attribute.
UNIVERSITYOF MORATUWA
XML Schema (3)
• Complex Type Definitions– Complex type definitions take the form of a
complexType element.
– A complex type can be a sequence of elements, attributes, simple or complex content, and more.
• Simple and Complex Content– Complex type definitions may contain simpleContent
and complexContent
• Mixed Content– Mixed content is declared as a complex type with the
mixed attribute
UNIVERSITYOF MORATUWA
XPath• XPath is a non-XML language for identifying
particular parts of XML documents. • XPath lets you write expressions that refer to the first
person element in a document, the seventh child element of the third person element, the ID attribute of the first person element whose contents are the string "Fred Jones", all xml-stylesheet processing instructions in the document's prolog, and so forth.
• XPath indicates nodes by position, relative position, type, content, and several other criteria.
• XSLT uses XPath expressions to match and select particular elements in the input document for copying into the output document or further processing.
UNIVERSITYOF MORATUWA
Xpath (2)• XPointer uses XPath expressions to identify the
particular point in or part of an XML document to which an XLink links.
• The W3C XML Schema Language uses XPath expressions to define uniqueness and co-occurrence constraints.
• XForms relies on XPath to bind form controls to instance data, express constraints on user-entered values, and calculate values that depend on other values.
• XPath expressions can also represent numbers, strings, or Booleans
• This lets XSLT stylesheets carry out simple arithmetic for purposes such as numbering and cross-referencing figures, tables, and equations.
UNIVERSITYOF MORATUWA
Xpath (3)
• String manipulation in XPath lets XSLT perform tasks such as making the title of a chapter uppercase in a headline or extracting the last two digits from a year.
• The Tree Structure of an XML Document• An XML document is a tree made up of nodes.
Some nodes contain one or more other nodes. There is exactly one root node, which ultimately contains all other nodes. XPath is a language for picking nodes and sets of nodes out of this tree.
UNIVERSITYOF MORATUWA
Xpath (4)
• From the perspective of XPath, there are seven kinds of nodes: – The root node– Element nodes– Text nodes– Attribute nodes– Comment nodes– Processing-instruction nodes– Namespace nodes
UNIVERSITYOF MORATUWA
Xpath (5)
UNIVERSITYOF MORATUWA
Xpath (6)<?xml version="1.0"?>
<?xml-stylesheet type="application/xml" href="people.xsl"?> <!DOCTYPE people [ <!ATTLIST homepage xlink:type CDATA #FIXED "simple" xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"> <!ATTLIST person id ID #IMPLIED> ]>
<people> <person born="1912" died="1954" id="p342"> <name> <first_name>Alan</first_name> <last_name>Turing</last_name> </name> <!-- Did the word computer scientist exist in Turing's day? --> <profession>computer scientist</profession> <profession>mathematician</profession> <profession>cryptographer</profession> <homepage xlink:href="http://www.turing.org.uk/"/> </person> <person born="1918" died="1988" id="p4567"> <name> <first_name>Richard</first_name> <middle_initial>P</middle_initial> <last_name>Feynman</last_name> </name> <profession>physicist</profession> <hobby>Playing the bongoes</hobby> </person>
</people>
UNIVERSITYOF MORATUWA
Xpath (7)• Location Paths
– The most useful XPath expression is a location path. – A location path identifies a set of nodes in a document. – This set may be empty, may contain a single node, or
may contain several nodes. These can be element nodes, attribute nodes, namespace nodes, text nodes, comment nodes, processing instruction nodes, root nodes, or any combination of these.
– A location path is built out of successive location steps. Each location step is evaluated relative to a particular node in the document called the context node.
• The Root Location Path– The simplest location path is the one that selects the
root node of the document. This is simply the forward slash (/)
– / is an absolute location path because no matter what the context node is
UNIVERSITYOF MORATUWA
Xpath (8)• For example, this XSLT template rule uses the XPath pattern / to
match the entire input document tree and wrap it in an html element: <xsl:template match="/">
<html><xsl:apply-templates/></html></xsl:template>
• Child Element Location Steps– The second simplest location path is a single element name. This
path selects all child elements of the context node with the specified name.
<?xml version="1.0"?> <xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="people"> <xsl:apply-templates select="person"/> </xsl:template> <xsl:template match="person">
<xsl:value-of select="name"/> </xsl:template>
</xsl:stylesheet>
UNIVERSITYOF MORATUWA
Xpath (9)
• In XSLT, the context node for an XPath expression used in the select attribute of xsl:apply-templates and similar elements is the node that is currently matched
• Attribute Location Steps– Attributes are also part of XPath. To select a
particular attribute of an element, use an @ sign followed by the name of the attribute you want.
UNIVERSITYOF MORATUWA
Xpath (10)• An XSLT stylesheet that uses root, child element, and
attribute location steps <?xml version="1.0"?> <xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/">
<html> <xsl:apply-templates select="people"/> </html>
</xsl:template><xsl:template match="people">
<table> <xsl:apply-templates select="person"/>
</table> </xsl:template> <xsl:template match="person">
<tr> <td><xsl:value-of select="name"/></td>
<td><xsl:value-of select="@born"/></td> <td><xsl:value-of select="@died"/></td>
</tr> </xsl:template>
</xsl:stylesheet>
UNIVERSITYOF MORATUWA
Xpath (11)<html>
<table> <tr>
<td> Alan Turing </td> <td>1912</td> <td>1954</td>
</tr> <tr>
<td> Richard P Feynman </td> <td>1918</td> <td>1988</td>
</tr> </table>
</html>
UNIVERSITYOF MORATUWA
Xpath (12)
The comment(), text(), and processing-instruction( ) Location Steps
• Although element, attribute, and root nodes account for 90% or more of what you need to do with XML documents, this still leaves four kinds of nodes that need to be addressed: namespace nodes, text nodes, processing-instruction nodes, and comment nodes. The other three node types have special node tests to match them. These are as follows: – comment( )– text( )– processing-instruction( )
UNIVERSITYOF MORATUWA
Xpath (13)• Since comments and text nodes don't have
names, the comment( ) and text( ) node tests match any comment or text node in the context node.
• Each comment is a separate comment node.• Each text node contains the maximum possible
contiguous run of text not interrupted by any tag.• By default, XSLT stylesheets do process text nodes
but do not process comment nodes. You can add a comment template rule to an XSLT stylesheet so it will process comments too.
• For example, this template rule replaces each comment with the text "Comment Deleted" in italic: – <xsl:template match="comment( )">
• <i>Comment Deleted</i> – </xsl:template>
UNIVERSITYOF MORATUWA
Xpath (14)Wildcards• Wildcards match different element and node types at the
same time. There are three of these: *, node( ), and @*. • The asterisk (*) matches any element node regardless of
name. For example, this XSLT template rule says that all elements should have their child elements processed but should not result in any output in and of themselves:
• <xsl:template match="*">– <xsl:apply-templates select="*"/>
• </xsl:template>• The * does not match attributes, text nodes, comments,
or processing-instruction nodes.
UNIVERSITYOF MORATUWA
Xpath (15)• The node( ) wildcard matches not only all element
types but also text nodes, processing-instruction nodes, namespace nodes, attribute nodes, and comment nodes.
• The @* wildcard matches all attribute nodes. • For example, this XSLT template rule copies the
values of all attributes of a person element in the document into the content of an attributes element in the output: <xsl:template match="person">
<attributes><xsl:apply-templates select="@*"/></attributes>
</xsl:template>
UNIVERSITYOF MORATUWA
Xpath (16)• Multiple Matches with |
<xsl:template match="first_name|last_name|profession|hobby"> <xsl:value-of select="text( )"/>
</xsl:template
• Compound Location Paths– Location steps can be combined with a forward slash (/) to make a
compound location path. Each step in the path is relative to the one that preceded it. If the path begins with /, then the first step in the path is relative to the root node. Otherwise, it's relative to the context node.
– For example, consider the XPath expression /people/person/name/first_name.
– This begins at the root node, then selects all people element children of the root node, then all person element children of those nodes, then all name children of those nodes, and finally all first_name children of those nodes
UNIVERSITYOF MORATUWA
Xpath (17)• Selecting from Descendants with //
– A double forward slash (//) selects from all descendants of the context node, as well as the context node itself.
– At the beginning of an XPath expression, it selects from all descendants of the root node.
– For example, the XPath expression //name selects all name elements in the document. The expression //@id selects all the id attributes of any element in the document.
– The expression person//@id selects all the id attributes of any element contained in the person child elements of the context node, as well as the id attributes of the person elements themselves.
UNIVERSITYOF MORATUWA
Xpath (18)• Selecting the Parent Element with ..
– A double period (..) indicates the parent of the current node. – For example, the XPath expression //@id identifies all id
attributes in the document. Therefore, //@id/.. identifies all elements in the document that have id attributes
• Selecting the Context Node with .– The single period (.) indicates the context node. In XSLT
this is most commonly used when you need to take the value of the currently matched node. For example, this template rule copies the content of each comment in the input document to a span element in the output document:
<xsl:template match="comment( )"> <span class="comment"><xsl:value-of select="."></span>
</xsl:template>
UNIVERSITYOF MORATUWA
XLink• XLinks are an attribute-based syntax for attaching links
to XML documents.• XLinks can be simple Point A-to-Point B links, like the
links you're accustomed to from HTML's A element. • XLinks can also be bidirectional, linking two documents
in both directions so you can go from A to B or B to A. • XLinks can even be multidirectional, presenting many
different paths between any number of XML documents.
• The documents don't have to be XML documents. Links can be placed in an XML document that lists connections between other documents that may or may not be XML documents themselves.
UNIVERSITYOF MORATUWA
Xlink (2)• Simple Links
– A simple link defines a one-way connection between two resources.
– The source or starting resource of the connection is the link element itself.
– The target or ending resource of the connection is identified by a Uniform Resource Identifier (URI).
– The link goes from the starting resource to the ending resource.– The starting resource is always an XML element. – The ending resource may be an XML document, a particular
element in an XML document, a group of elements in an XML document, a span of text in an XML document, or something that isn't a part of an XML document, such as an MPEG movie or a PDF file. The URI may be something other than a URL, for instance a book ISBN number like urn:isbn:1565922247.
UNIVERSITYOF MORATUWA
Xlink (3)
<novel> – <title>The Wonderful Wizard of Oz</title>
<author>L. Frank Baum</author> <year>1900</year>
</novel> • A simple XLink is encoded in an XML document
as an element of arbitrary type that has an xlink:type attribute with the value simple and an xlink:href attribute whose value is the URI of the link target. The xlink prefix must be mapped to the http://www.w3.org/1999/xlink namespace URI
UNIVERSITYOF MORATUWA
Xlink (4)<novel xmlns:xlink= "http://www.w3.org/1999/xlink" xlink:type =
"simple"
xlink:href = "ftp://archive.org/pub/etext/etext93/wizoz10.txt"> <title>The Wonderful Wizard of Oz</title>
<author>L. Frank Baum</author>
<year>1900</year>
</novel> • This establishes a simple link from this novel
element to the plain text file found at ftp://archive.org/pub/etext/etext93/wizoz10.txt
• Browsers are free to interpret this link as they like.
UNIVERSITYOF MORATUWA
Xlink (5)
• Every XLink element must have an xlink:type attribute telling you what kind of link (or part of a link) it is. This attribute has six possible values: – Simple
– Extended
– Locator
– Arc
– Title
– Resource
• Simple XLinks are the only ones that are really similar to HTML links
UNIVERSITYOF MORATUWA
Xlink (6)<novel xmlns:xlink= "http://www.w3.org/1999/xlink"
xlink:type = "simple“ xlink:href = "urn:isbn:0688069444"> <title>The Wonderful Wizard of Oz</title> <author>L. Frank Baum</author> <year>1900</year>
</novel>
• The xlink:href attribute identifies the resource being linked to.• It always contains a URI. • Both relative and absolute URLs can be used, as they are in
HTML links. However, the URI need not be a URL.• For example, the above link identifies but does not locate the
print edition of The Wonderful Wizard of Oz with the ISBN number 0688069444:
UNIVERSITYOF MORATUWA
XPointer
• XPointers are a non-XML syntax for identifying locations inside XML documents.
• An XPointer is attached to the end of the URI as its fragment identifier to indicate a particular part of an XML document rather than the entire document
• HTML:– <a name="download"></a>– http://java.sun.com:80/products/jndi/
index.html#download
UNIVERSITYOF MORATUWA
Xpointer (2)
• Named anchors in HTML has one major drawback:• to link to a particular point of a particular document,
you must be able to modify the document to which you're linking in order to insert a named anchor at the point to which you want to link.
• XPointer endeavors to eliminate this restriction by allowing you to specify where you want to link to using full XPath expressions as fragment identifiers.
• Furthermore, XPointer expands on XPath by providing operations to select particular points in or ranges of an XML document that do not necessarily coincide with any one node or set of nodes. For instance, an XPointer can describe the range of text currently selected by the mouse.
UNIVERSITYOF MORATUWA
Xpointer (3)
• The most basic form of XPointer is simply an XPath expression ・ often, though not necessarily, a location path enclosed in the parentheses of xpointer( ).
• For example, these are all acceptable XPointers: – xpointer(/)– xpointer(//first_name)– xpointer(id('sec-intro'))
xpointer(/people/person/name/first_name/text( )) xpointer(//middle_initial[position( )=1]/../first_name) xpointer(//profession[.="physicist"]) xpointer(/child::people/child::person[@index<4000]) xpointer(/child::people/child::person/attribute::id)
UNIVERSITYOF MORATUWA
Xpointer (4)• If you're uncertain whether a given XPointer will locate
something, you can back it up with an alternative XPointer. • For example, this XPointer looks first for first_name elements.
However, if it doesn't find any, it looks for last_name elements instead:
• xpointer(//first_name)xpointer(//last_name) • The last_name elements will be found only if there are no
first_name elements. You can string as many of these XPointer parts together as you like.
• XPointers in Links– if you wanted a URL that pointed to the first name element in the
document at http://www.cafeconleche.org/people.xml, you would type:
– http://www.cafeconleche.org/people.xml#xpointer(//name[position( )=1])
UNIVERSITYOF MORATUWA
Xpointer (5)
• XPointers are more frequently used in XLinks.
• For example, this simple link points to the first book child of the bookcoll child of the testament root element in the document at the relative URL ot.xml:
<In_the_beginning xlink:type="simple" xlink:href="ot.xml#xpointer(/testament/bookcoll/book [position( )=1])"> Genesis
</In_the_beginning>
UNIVERSITYOF MORATUWA
Cascading Style Sheets (CSS)• The names of most elements describe the semantic
meaning of the content they contain. However, ultimately this content needs to be formatted and displayed to users.
• For this to occur, there must be a step where formatting information is applied to the XML document and the semantic markup is transformed into presentational markup.
• There are a variety of choices for the syntax of this presentation layer. However, two are particularly noteworthy: – Cascading Style Sheets (CSS)– XSL Formatting Objects (XSL-FO)
UNIVERSITYOF MORATUWA
CSS (2)• CSS is a non-XML syntax for describing the
appearance of particular elements in a document. • CSS is a very straight-forward language. No
transformation is performed. The parsed character data of the document is presented more or less exactly as it appears in the XML document,
• A CSS stylesheet does not change the markup of an XML document at all; it merely applies styles to the content that already exists
• By way of contrast, XSL-FO is a complete XML application for describing the layout of text on a page.
UNIVERSITYOF MORATUWA
CSS (3)• It has elements that represent pages, blocks of text
on the pages, graphics, horizontal rules, and more. • One does not normally work with this application
directly. Instead, one can write an XSLT stylesheet that transforms the document's native markup into XSL-FO.
• The application rendering the document reads the XSL-FO and displays it to the user.
• CSS Level 2 is the current recommendation and the version of CSS
• CSS Level 2 places XML on an equal footing with HTML.
UNIVERSITYOF MORATUWA
CSS (4)A semantically tagged XML document after application
of a CSS stylesheet
UNIVERSITYOF MORATUWA
CSS (5)• This stylesheet (receipe.css) has four style rules.• Each rule names the element(s) it formats and follows that
with a pair of curly braces containing the style properties to apply to those elements.
• Each property has a name such as font-family and a value such as "New York", "Times New Roman", serif.
• Properties are separated from each other by semicolons. • Neither the names nor the values are case sensitive. That is,
font-family is the same as FONT-FAMILY or Font-Family. • CSS Level 2 defines over 100 different style properties.
However, you don't need to know all of these. Reasonable default values are provided for all the properties you don't set.
UNIVERSITYOF MORATUWA
CSS (6)• For example, the first rule applies to the recipe element and
says that it should be formatted using the New York font at a 12 point size. If New York isn't available, then Times New Roman will be chosen instead; if that isn't available, then any convenient serif font will suffice.
• These styles also apply to all descendants of the recipe element; that is, the styles cascade down the tree. Since recipe is the root element, this sets the default font for the entire document.
• The second rule makes the dish element look like a heading, as you can see in rendered document.
• It's set to a much larger sans serif font and made bold and centered besides. Furthermore, its display style is set to block. This means there'll be a line break between the dish and its next and previous sibling elements.
UNIVERSITYOF MORATUWA
CSS (7)• The third rule formats the ingredients as a bulleted list, while
the fourth rule formats both the directions and story elements as more-or-less straight-forward paragraphs with a little extra whitespace around their top and left-hand sides.
• Not all the elements in the document have style rules and not all need them.
• For example, the step element is not specifically styled. Rather, it simply inherits a variety of styles from its ancestor elements directions and recipe, as well as using some defaults. A different stylesheet could add a rule for the step element that overrides the styles it inherits. For example, this rule would set its font to 10 point Palatino:
• step {font-family: Palatino, serif; font-size: 10pt }
UNIVERSITYOF MORATUWA
CSS (8)
Associating Stylesheets with XML Documents• CSS stylesheets are primarily intended for use in web
pages. • Web browsers find the stylesheet for a document by
looking for xml-stylesheet processing instructions in the prolog of the XML document.
• This processing instruction should have a type pseudoattribute with the value text/css and an href pseudoattribute whose value is an absolute or relative URL locating the stylesheet document.
• <?xml-stylesheet type="text/css" href="recipe.css"?>
UNIVERSITYOF MORATUWA
CSS (9)• Including the required type and href pseudoattributes, the xml-
stylesheet processing instruction can have up to six pseudoattributes: – type
This is the MIME media type of the stylesheet; text/css for CSS and application/xml (not text/xsl!) for XSLT.
– href This is the absolute or relative URL where the stylesheet can be
found. – charset
This names the character set in which the stylesheet is written, such as UTF-8 or ISO-8859-7.
– title This pseudoattribute names the stylesheet. If more than one
stylesheet is available for a document, the browser may (but is not required to) present readers with a list of the titles of the available stylesheets and ask them to choose one.
UNIVERSITYOF MORATUWA
CSS (10)• media Printed pages, television screens, and computer displays are all fundamentally different media that require different styles. For example, comfortable reading on screen requires much larger fonts than on a printed page. This pseudoattribute specifies the media types this stylesheet should apply to. There are nine predefined values.
screenttytvprojectionhandheldprintbrailleauralall
By including several xml-stylesheet processing instructions, each pointing to a different stylesheet and each using a different media type, you can make a single document attractive in many different environments.
UNIVERSITYOF MORATUWA
CSS (11)alternate This pseudoattribute must be assigned one of the two values yes or no. yes
means this is an alternate stylesheet, not normally used. no means this is the stylesheet that will be chosen unless the user indicates that they want a different one. The default is no.
For example, this group of xml-stylesheet processing instructions could be placed in the prolog of the recipe document to make it more accessible on a broader range of devices:
<?xml-stylesheet type="text/css" href="recipe.css" media="screen" qalternate="no" title="For Web Browsers" charset="US-ASCII"?> <?xml-stylesheet type="text/css" href="printable_recipe.css" media="print" alternate="no" title="For Printing" charset="ISO-8859-1"?>
<?xml-stylesheet type="text/css" href="big_recipe.css" media="projection" alternate="no" title="For presentations" charset="UTF-8"?>
<?xml-stylesheet type="text/css" href="tty_recipe.css" media="tty" alternate="no" title="For Lynx" charset="US-ASCII"?>
<?xml-stylesheet type="text/css" href="small_recipe.css" media="handheld"
alternate="no" title="For Palm Pilots" charset="US-ASCII"?>
UNIVERSITYOF MORATUWA
CSS (12)Selectors• CSS provides limited abilities to select the elements to which a given
rule applies. • Many stylesheets only use element names and lists of element names
separated by commas, as shown in receipe.xml. • However, CSS provides some other basic selectors you can use,
though they're by no means as powerful as the XPath syntax of XSLT.
The Universal Selector• The asterisk matches any element at all; that is, it applies the rule to
everything in the document that does not have a more specific, conflicting rule. For example, this rule says that all elements in the document should use a large font:
• * {font-size: large}
UNIVERSITYOF MORATUWA
CSS (13)Matching Descendants, Children, and Siblings• An element name A followed by another element name
B matches all B elements that are descendants of A elements.
• For example, this rule matches quantity elements that are descendants of ingredients elements, but not other ones that appear elsewhere in the document:
• ingredients quantity {font-size: medium} • If the two element names are separated by a greater
than sign (>), then the second element must be an immediate child of the first for the rule to apply.
UNIVERSITYOF MORATUWA
CSS (14)• For example, this rule gives quantity children of
ingredient elements the same font-size as the ingredient element:
• ingredient > quantity {font-size: inherit} • If the two element names are separated by a plus sign
(+), then the second element must be the next sibling element immediately after the first element.
• For example, this style rule sets the border-top-style property for only the first story element following a directions element:
• directions + story {border-top-style: solid}
UNIVERSITYOF MORATUWA
CSS (15)Attribute Selectors• Square brackets allow you to select elements with particular
attributes or attribute values.• For example, this rule hides all step elements that have an
optional attribute: • step[optional] {display: none} • This rule hides all elements that have an optional attribute
regardless of their name: • *[optional] {display: none} • An equals sign selects an element by a given attribute's value. • For example, this rule hides all step elements that have an
optional attribute with the value yes: • step[optional="yes"] {display: none}
UNIVERSITYOF MORATUWA
CSS (16)• The ~= operator selects elements that contain a given
word as part of the value of a specified attribute. The word must be complete and separated from other words in the attribute value by whitespace, as in a NMTOKENS or ENTITIES attribute. That is, this is not a substring match. For example, this rule makes bold all recipe elements whose source attribute contains the word "Anderson":
• recipe[source~="Anderson"] {font-weight: bold}
• Finally, the |= operator matches against the first word in a hyphen-separated attribute value, such as Anderson-Harold or fr-CA.
UNIVERSITYOF MORATUWA
CSS (17)Pseudoclass Selectors• Pseudoclass selectors match elements according to a condition
not involving their name. • There are seven of these. They are all separated from the
element name by a colon. • For example, the first-child pseudoclass matches the first child
element of the named element. When applied to receipe.xml, this rule italicizes the first, and only the first, step element:
• step:first-child {font-style: italic} • The link pseudoclass matches the named element if and only if
that element is the source of an as yet unvisited link. For example, this rule makes all links in the document blue and underlined:
• *:link {color: blue; text-decoration: underline}
UNIVERSITYOF MORATUWA
CSS (18)• The visited pseudoclass applies to all visited links of the
specified type. For example, this rule marks all visited links as purple and underlined:
• *:visited {color: purple; text-decoration: underline}
• The active pseudoclass applies to all elements that the user is currently activating (for example, by clicking the mouse on). Exactly what it means to activate an element depends on the context, and indeed not all applications can activate elements.
• For example, this rule marks all active elements as red: • *:active {color: red}
UNIVERSITYOF MORATUWA
CSS (19)• The linking pseudoclasses are not yet well-supported for XML
documents because most browsers don't recognize XLinks.• The hover pseudoclass applies to elements on which the
cursor is currently positioned but which the user has not yet activated.
• For example, this rule marks all these elements as green and underlined:
• *:hover {color: green; text-decoration: underline} • The focus pseudoclass applies to the element that currently
has the focus. • For example, this rule draws a one-pixel red border around the
element with the focus, assuming there is such an element: • *:focus {border: 1px solid red }
UNIVERSITYOF MORATUWA
CSS (20)• Finally, the lang pseudoclass matches all
elements in the specified language as determined by the xml:lang attribute.
• For example, this rule uses the David New Hebrew font for all elements written in Hebrew (more properly, all elements whose xml:lang attribute has the value he or any subtype thereof).
• *:lang(he) {font-family: "David New Hebrew"}
UNIVERSITYOF MORATUWA
CSS (21)Pseudoelement Selectors• Pseudoelement selectors match things that aren't actually
elements. Like pseudoclass selectors they're attached to an element selector by a colon. There are four of these: – first-letter– first-line– before– after
• The first-letter pseudoelement selects the first letter of an element. For example, this rule makes the first letter of the story element a drop cap:
• story:first-letter { font-size: 200%;font-weight: bold;float: left;padding-right: 3pt }
UNIVERSITYOF MORATUWA
CSS (22)• The Display Property• Display is one of the most important CSS properties. This
property determines how the element will be positioned on the page.
• There are 18 legal values for this property.• However, the two primary values are inline and block. The
display property can also be used to create lists and tables, as well as to hide elements completely.
• Inline Elements• Setting the display to inline, the default value, places the
element in the next available position from left to right, much as each word in this paragraph is positioned. The text may be wrapped from one line to the next if necessary, but there won't be any hard line breaks between each inline element.
UNIVERSITYOF MORATUWA
CSS (23)• In receipe.xml and receipe.css, the quantity, step,
person, city, and state elements were all formatted as inline. This didn't need to be specified explicitly because it's the default.
• Block Elements– In contrast to inline elements, an element set to display:
block is separated from its siblings, generally by a line break.
– For example, in HTML, paragraphs and headings are block elements. In receipe.{xml,css}, the dish, directions, and story elements were all formatted with display: block.
• List Elements– An element whose display property is set to list-item is also
formatted as a block-level element. – However, a bullet is inserted at the beginning of the block.
UNIVERSITYOF MORATUWA
CSS (24)– The list-style-type, list-style-image, and list-style-position
properties control which character or image is used for a bullet and exactly how the list is indented. For example, this rule would format the steps as a numbered list rather than rendering them as a single paragraph:
step { display: list-item; list-style-type: decimal; list-style-position: inside }
Hidden Elements– An element whose display property is set to none is not included
in the rendered document the reader sees. It is invisible and does not occupy any space or affect the placement of other elements.
– For example, this style rule hides the story element completely: – story {display: none}